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Abstract 



o 

1-^ The similarity of two polygonal curves can be measured using the Frechet distance. 

We introduce the notion of a more robust Frechet distance, where one is allowed to 



shortcut between vertices of one of the curves. This is a natural approach for handling 

noise, in particular batched outliers. We compute a constant factor approximation to 

the minimum Frechet distance over all possible such shortcuts, in near linear time, if 

the curve is c-packed and the number of shortcuts is small. 

To facilitate the new algorithm we develop several new tools: 
c/3 



Q (A) A data-structure for preprocessing a curve (not necessarily c-packed) that 

'— ' supports (1 -|- e)-approximate Frechet distance queries between a subcurve 

^^ (of the original curve) and a line segment. 

J> (B) A near linear time algorithm that computes a permutation of the vertices 

^? of a curve, such that any prefix of 2fc — 1 vertices of this permutation, form 

r~^ an optimal approximation (up to a constant factor) to the original curve 

^—1 compared to any polygonal curve with k vertices, for any A; > 0. 

t^^ (C) A data-structure for preprocessing a curve that supports 0(l)-approximate 

"^ Frechet distance queries between a subcurve and a query polygonal curve. 

,__! The query time depends quadratically on the complexity of the query curve, 

L| and only (roughly) logarithmically on the complexity of the original curve. 
To our knowledge, these are the first data-structures to support these kind of queries 
efficiently. 



1 Introduction 

Comparing the shapes of polygonal curves - or sequenced data in general - is a challeng- 
ing task that arises in many different contexts. The Frechet distance and its variants (e.g. 
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dynamic time- warping |KP99j ) have been used as similarity measures in various applica- 
tions such as matching of time series in databases |KKS05] . comparing melodies in mu- 
sic information retrieval |SGHS08] . matching coastlines over time |MDBH0"6] . as well as 
in map-matching of vehicle tracking data |BPSW05| IWSP06] . and moving objects analysis 
[BBGOSal lBBG^08b| . Informally, the Frechet distance between two curves is defined as the 
maximum distance a point on the first curve has to travel as this curve is being continuously 
deformed into the second curve. Another common description uses the following "leash"- 
metaphor: Imagine traversing the two curves simultaneously and at each point in time the 
two positions are connected by a leash of a fixed length. During the traversal you can vary 
the speeds on both curves independently, but not walk backwards. The Frechet distance 
corresponds to the minimum length of a leash that permits such a traversal. 

The Frechet distance captures perceptual and geograph- 
ical similarity under non-afiine distortions as well as spatio- 
temporal similarity [MSSZZlT] . However, it is very sensitive 
to local noise, and often not used in practice for that reason. 
Unlike similarity measures such as the root-mean-square de- 
viation (RMSD), which averages over a set of similarity val- 
ues, and dynamic time warping, which minimizes the sum 
of distances along the curves, the Frechet distance is a so-called bottleneck measure and can 
therefore be affected to an extent which is generally unrelated to the relative amount of noise 
across the curves. In practice, curves might be generated by physical tracking devices, such 
as GPS, which is known to be inaccurate when the connection to the satellites is temporar- 
ily disturbed due to atmospheric conditions or reflections of the positioning signal on high 
buildings. Such inaccurate data points are commonly referred to as "outliers". Note that 
outliers come in batches if they are due to such a temporary external condition. Similarly, in 
computer vision applications, the silhouette of an object could be partially occluded, and in 
sound recordings, outliers may be introduced due to background sounds or breathing. De- 
tecting outliers in time series has been studied extensively in the literature |MMY06] . One 
may also be interested in outliers as a deviation from a certain expected behavior or because 
they carry some meaning. It could be, for instance, that trajectories of two hikers deviate 
locally, because one hiker chose to take a detour to a panoramic view point, see the example 
in the figure above. Outlier detection is inherently non-trivial if not much is known about 
the underlying probability distributions and the data is sparse |AY05j . We circumvent this 
problem in the computation of the Frechet distance by minimizing over all possibilities for 
outlier-removal. In a sense, our approach is similar to computing a certain notion of partial 
similarity. 



The task at hand. We are given two polygonal curves X and Y, which we perceive as 
a sequence of linearly interpolated measurement points. We believe that Y is similar to X 
but it might contain considerable noise that is occluding this similarity. That is, it might 
contain erroneous measurement points (outliers), which need to be ignored when assessing 
the similarity. We would like to apply a few edit operations to Y so that it becomes as similar 
to X as possible. In the process hopefully removing the noise in Y and judging how similar 
it really is to X. To this end, we - conceptually - remove subsets of measurement points. 



which we suspect to be outhers, and ininimize over all possibilities for such a removal. This 
is formalized in the shortcut Frechet distance. 

Shortcut Frechet distance. A shortcut replaces a subcurve between two vertices by 
a straight segment that connects these vertices. The part being shortcut is not ignored, 
but rather the new curve with the shortcuts has to be matched entirely to the other curve 
under the Frechet distance. As a concrete example, consider the figure below. The Frechet 
distance between X and Y is quite large, but after we shortcut the outlier "bump" in Y , the 
resulting new curve Z has a considerably smaller Frechet distance to X. We are interested 
in computing the minimum such distance using k shortcuts, where A; is a small constant. 

Naturally, there are many other possibilities to try and tackle 
the task at hand, for example: (i) considering all possible short- 
cuts together and allowing an unbounded number of shortcuts, (ii) 
allowing shortcuts on both curves, (iii) using shortcuts between 
vertices that are close by on the curve, (iv) partial Frechet distance 
(mentioned above), (v) allow shortcuts starting and ending in the 
middle of edges, etc. 

If one is interested in (i) + (iii) then the problem turns into a map-matching problem, 
where the start and end points are fixed and the graph is formed by the curve and its eligible 
shortcuts. For this problem, results can be found in the literature [CDG"*"!!! IAERW03] . A 




recent result by Har-Peled and Raichel |HRllj is applicable to the variant where one allows 
such shortcuts on both curves, i.e. (i)-(-(ii)-|-(iii). Note that allowing shortcuts on both 
curves does not yield a meaningful measure, if the shortcuts are not sufficiently restricted. 

In this paper, we concentrate on the fc-shortcut Frechet distance because it seems to be 
more natural than some of the variants mentioned above, and computing it efficiently seems 
like a first step in understanding how to solve some of the more difficult variants, e.g., (v). 
Furthermore, we also discuss efficient solutions for the variant in (i). Surprisingly, computing 
this "simplistic" shortcut Frechet distance is quite challenging, especially if one is interested 
in an efficient algorithm. 

Input model. A curve Y is c-packed if the total length of Y inside any ball is bounded 
by c times the radius of the ball. Intuitively, c-packed curves behave reasonably in any 
resolution. The boundary of convex polygons, algebraic curves of bounded maximum degree, 
the boundary of (a, /3)-covered shapes |Efr05] . and the boundary of 7-fat shapes |dB08j are 
all c-packed. Interestingly, the class of c-packed curves is closed under simplification, see 
[DHWIO] . This makes them attractive for efficient algorithmic manipulation. 

Another input model which is commonly used is called low density [dBKSV02] . We call 
a set of line segments (p- dense, if for any ball the number of line segments that intersect 
this ball and which are longer than the radius of the ball is bounded by (p. It is easy to see 
by a simple packing argument that c-packed curves are 0(c)-dense. 

Informal restatement of the problem. In the parametric space of the two input curves, 
we are given a terrain defined over a.n n x n grid partitioning [0, 1]^, where the height at 
each point is defined as the distance between the two associated points on the two curves. 



As in the regular Frechet distance, we are interested in finding a path between (0, 0) and 
(1, 1) on the terrain, such that the maximum height on the path does not exceed some 6 
(the minimum such 6 is the desired distance). This might not be possible as there might be 
"mountain chains" blocking the way. To overcome this, we are allowed to introduce tunnels 
that go through such obstacles. Each of these tunnels connect two points that lie on the 
horizontal lines of the grid, as these correspond to the vertices of one curve. Naturally, we 
require that the starting and ending points of such a tunnel have height at most S (the 
current distance threshold being considered), and furthermore, the price of such a tunnel 
(i.e., the Frechet distance between the corresponding shortcut and subcurve) is smaller than 
6. Once we introduce these tunnels, we need to compute a monotone path from (0, 0) to 
(1, 1) in the grid which uses at most k tunnels. Finally, we need to search for the minimum 
6 for which there is a feasible solution. 

Challenge and ideas. A priori there are potentially O(n^) horizontal edges of the grid 
that might contain endpoints of a tunnel, and as such, there are potentially ©(n"^) different 
families of tunnels that the algorithm might have to consider. A careful analysis of the 
structure of these families shows that, in general, it is sufficient to consider one (canonical) 
tunnel per family. Using c-packedness and simplification, we can reduce the number of 
relevant grid edges to near linear. This in turn reduces the number of potential tunnels that 
need to be inspected to Oln^). This is still insufficient to get a near linear time algorithm. 
Surprisingly, we prove that if we are interested only in a constant factor approximation, for 
every horizontal edge of the grid we need to inspect only a constant number of tunnels. Thus, 
we reduce the number of tunnels that the algorithm needs to inspect to near linear. And yet 
we are not done, as naively computing the price of a tunnel requires time near linear in the 
size of the associated subcurve. To overcome this, we develop a new data-structure, so that 
after preprocessing we can compute the price of a tunnel in polylogarithmic time per tunnel. 
Now, carefully putting all these insights together, we get a near linear time algorithm for 
the approximate decision version of the problem. 

However, to compute the minimum 6, for which the decision version returns true - which 
is the shortcut Frechet distance - we need to search over the critical values of 6. To this 
end, we investigate and characterize the critical values introduced by the shortcut version 
of the problem. Using the decision procedure, we perform a binary search of several stages 
over these values, in the spirit of |DHW10] . to get the required approximation. 

Our results 

(A) Computing the shortcut Frechet distance. For a prescribed parameter e > 0, we 
present an algorithm for computing a (3 + £:)-approximation to the shortcut Frechet 
distance between two given c-packed polygonal curves of total complexity n, see Defi- 



nition 2.3 for the formal definition of the distance being approximated. 



If we allow an unbounded number of shortcuts the running time of the new algorithm 



is 0(c^nlog n), see Theorem 4.11 for the exact result. We also present a variant of 
this algorithm that handles the case where we allow only k shortcuts, with running 
time 0(c^/cnlog^n). In the analysis of these problems we use techniques developed by 
Driemel et al. in |DHW10] and follow the general approach used in parametric search 



of devising a decision procedure which is used to search over the critical events for the 
Frechet distance. The shortcuts introduce a new type of critical event, which we analyze 



in Section 4.3 Furthermore, the algorithm uses a new data-structure (described next) 
that is interesting on its own merit. 
(B) Frechet distance queries between a segment and a subcurve. We present a data 
structure that preprocesses a given polygonal curve Y, such that given a query segment 
h, and two points p, p' on Y (and the edges containing them), it (1 +e)-approximates the 
Frechet distance between h and the sub-curve of Y between p and p'. Surprisingly, the 
data-structure works for any polygonal curve (not necessarily packed or dense), requires 
near linear preprocessing time and space, and can answer such queries in polylogarithmic 



time (ignoring the dependency on e). See Theorem 6.7 for the exact result. 
(C) Universal curve simplification. We show how to preprocess a polygonal curve in 
near-linear time and space, such that, given a number fc G IN, one can compute a 
simplification in 0{k) time which has K = 2k — 1 vertices (of the original curve) and is 
optimal up to a constant factor with respect to the Frechet distance to the original curve, 
compared to any curve which uses k vertices. Surprisingly, this is done by computing 
a permutation of the vertices of the input curve, such that this simplification is the 
subcurve defined by the first K vertices in this permutation. Namely, we compute an 



ordering of the vertices of the curves by their Frechet "significance" . See Theorem 7/7 
for the exact result. 
(D) Frechet distance queries between a curve with k vertices and a subcurve. 

We use the above universal simplification, to extend the data-structure in (B) to sup- 
port queries with polygonal curves of multiple segments (as opposed to single segments) 
and obtain a constant factor approximation with polylogarithmic query time, see The- 
orem |7.9[ The query time is quadratic in the query curve complexity and logarithmic 



in the input curve complexity. 

Previous results. For two polygonal curves of total complexity n in the plane, their 
Frechet distance can be computed in 0{n'^ logn) time |AG95j . It has been an open problem 
to find a subquadratic time algorithm for computing the Frechet distance for two curves. 
For the problem of deciding whether the Frechet distance between two curves is smaller 
or equal a given value a lower bound of n{n\ogn) was given by Buchin et al. |BBK"'"07] . 
However, it is conjectured that the decision problem is 3SUM-hard |Alt09] . Recently, Driemel 
et al. presented a near linear time (1 -|- £)-approximation algorithm for the Frechet distance 
assuming the curves are well behaved |DHW10j : that is, c-packed. 

Buchin et al. [BBW09] showed how to compute the partial Frechet distance under the Li 
and Loo metric. Here, one fixes a threshold 5, and computes the maximal length of subcurves 
of the input curves that match under Frechet distance 5. The running time of their algorithm 
is roughly 0(?T,^logn). To the best of our knowledge the problem of computing the Frechet 
distance when one is allowed to introduce shortcuts has not been studied before. 

For the problem of counting the number of subcurves that are within a certain Frechet 
distance, a recent result by de Berg et al. provides a data-structure to answer such queries 
up to a constant approximation factor |MdBll] . 



Previous work on curve simplification. There is a large body of literature on curve 
simplification. Since this is not the main subject of the paper, we only discuss a selection 
of results which we consider most relevant, since they use the Frechet distance as a quality 
measure. Agarwal et al. JAHMW05] give a near-linear time approximation algorithm to 
compute a simplification which is in Frechet distance e to the original curve and whose size 
is at most the size of the optimal simplification with error e/2. Abam et al. |AdBHZTO] study 
the problem in the streaming setting, where one wishes to maintain a simplification of the 
prefix seen so far. Their algorithm achieves an 0(1) competitive ratio using 0{k'^) additional 
storage and maintains a curve with 2k vertices which has a smaller Frechet distance to the 
prefix than the optimal Frechet simplification with k vertices. Bereg et al. |BJW"'"08] give an 
exact 0{n\ogn) algorithm that minimizes the number of vertices in the simplification, but 
using the discrete Frechet distance, where only distances between the vertices of the curves 
are considered. 

Organization. In Section [2] we describe some basic definitions and results. In particular, 
the formal problem statement and the definition of the k- shortcut Frechet distance between 



two curves is given in Section 2.1 We also discuss some basic tools needed for the algorithm. 



In Section |3} we describe the approximation algorithm. Here, we devise an approximate 



decision procedure in Section 3.2 that is used in the main algorithm, described in Section 3.3 
to search over an approximate set of candidate values. The analysis of this algorithm is in 
Section |4} Since the shortcuts introduce a new set of candidate values, we provide an 



elaborate study of these new events in Section 4.3 The main result for approximating the 



shortcut Frechet distance is stated in Theorem 14.111 In Section |5] we describe how to extend 



the above algorithm to handle the fc-shortcut Frechet distance (see Theorem 5.5). In the 
remaining sections we describe the new data-structures. In Section [6] we describe a data- 
structure for a fixed curve, that answers queries for the Frechet distance between a subcurve 
and a given segment. In Section [7} we use this data-structure to compute the universal curve 
simplification. The extension to query curves with more than two vertices are described in 



Section 7.2 We conclude (exhausted) with discussion and some open problems in Section [8j 
In Appendix IaI we provide some (relatively standard) background on the Frechet distance 
and the notion of the free space diagram. 

2 Preliminaries 

Notation. For a closed set C C IR*^, and a point p G IR*^ let nn(p, C) denote the clos- 
est point to p in C. The distance of p to the set C is d(p,C) = miuqgc ||p — q|| = 
Up — nn(p, C)||. For a set of numbers f/, an atomic interval is a maximum interval of 
IR that does not contain any point of U in its interior. 

A curve X is a continuous mapping from [0, 1] to IR , where X(t) denotes the point on 
the curve parameterized by t G [0, 1]. Given two curves X and Y that share an endpoint, let 
X + Y denote the concatenated curve. For a curve Y , the segment connecting its endpoints 
is its spine, denoted by spine(y). We denote with X[a;,a;'] the subcurve of X from X(x) 
to X{x') and with X{p, p') the subcurve of X between the two points p, p' G X. Similarly, 
y [y,y'] denotes the line segment between the points Y{y) to Y{y'), we call this a shortcut 
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if Y{y) and Y{y') are vertices of Y. 



2.1 The /c-shortcut Prechet distance 



Standard definitions of the Frechet distance and the free space diagram are delegated to 
Appendix El see also |DHW10] . We reproduce here the definition of the Frechet distance. 



Definition 2.1. Given two curves X and F in IR , the Frechet distance between them 



is 



dy(X,F) = min max \\X(f(a))-Y(a)\\ , 

where / is an orientation-preserving reparameterization of X. Furthermore, ii g : X ^ Y is 
a homomorphism between X and Y, we define the width of g as 

dg{X,Y) = max \\X{a) - g{X{a))\\ . 

ae[o,i] 

Definition 2.2. Given a polygonal curve Y, we refer to any order-preserving concatenation 
of fc + 1 non-overlapping subcurves of Y, that has straight line segments connecting the 
endpoints of the subcurves, as a k-shortcut curve of Y. Formally, for values < yi < 
?/2 < ■ ■ ■ < 2/2fc < 1, such that each Y{yi) is a vertex of Y, the shortcut curve is defined as 

Y[0,yi] +Y[y^,y2] +r [2/2, 1/3] + ■ ■ ■ + Y[y2k-i,y2k\ +Y[y2k,l]- 

Definition 2.3. Given two polygonal curves X and Y, we define their k-shortcut Frechet 

distance as the minimal Frechet distance between the curve X and any fc'-shortcut curve 
of Y, where < k' < k. We denote it with dg{k, X, Y). If we do not want to constrain the 
maximal number of shortcuts, we set fc = 00 and simply refer to it as the shortcut FVechet 
distance. We denote it with ds(X, Y). Note that in both versions we only allow one of the 
curves to be shortcut and the shortcuts are between vertices of Y. 

Free space. Recall that the S-free space of X and y is a subset of the parametric space, 
defined as 

2)<,(X,F) = \ix,y) E [0,1]2 \\X{x)-Yiy)\\ < S 



as described in Appendix El and that we denote with ?\f<5(X, F) the total number of grid 
cells that have a non-empty intersection with D<5(X, F). For a point p = {xp,yp) G [0, 1]^, 
we define its elevation to be 

d(p) = i|x(xp)-r(yp)i|. (1) 

Given a finite set of points P in the parametric space of X and Y the points that are 
reachable by an (x, ?/)-monotone path from a point in P that stays inside this free space is 
the locally reachable free space from P (denoted by 3i<5(P)). The k-reachable free 
space 3i<^(X, F) is 

^lsiX,Y) = |p = (xp,yp) e [0,1]2 |ds(A;,X[0,a;p] ,Y[0,yp\) < 6^. 

This is the set of points that have an (x, |/)-monotone path from (0, 0) that stays inside the 
free space and otherwise uses at most k tunnels, which are defined in the next subsection. 
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Figure 1: (A) Example of two dissimilar curves that can be made similar by sliortcutting 
one of them. (B) The shortcut corresponds to a tunnel between disconnected components of 
the free space. (C) The curve Z resulting from shortcutting Y . Its (regular) Frechet distance 
from X is dramatically reduced. (Horizontal and vertical axes of the parametric space are 
exchanged in this figure.) 



2.2 Tunnels and gates — definitions 

2.2.1 Tunnels 

In the parametric space, a shortcut F[j/p,j/q] and the subcurve X[a;p,a;q], that it is be- 
ing matched to, correspond to a segment pq C [0, 1]^, called a tunnel and denoted by 
r(p,q), where p = (xp,|/p) and q = (a;q,?/q). We require Xp < Xq and j/p < y^ for mono- 
tonicity. Refer to Figure [T] for an example of a tunnel. We call the Frechet distance of the 
shortcut segment to the subcurve the price of this tunnel and denote it with prc^(p,q) = 
dgr(X[xp,Xq] , y [i/p, i/q] ) . A tuuuel r(p, q) is feasible for 5 if it holds that d(p) < 5 and 
d(q) < 5, i.e., if p, q G D<5(X, Y). Now, let u = Y{yp) and v = Y{yq) and let e be the edge 
of X that contains X{xp) (resp. e' the edge that contains X(a;q)) for the tunnel r(p, q). We 
denote with T(e, e', u, v) the family of tunnels that r(p, q) belongs to. Furthermore, let 
T<s{b, e', u, v) denote the subset of these tunnels that are feasible for 5. 

Definition 2.4. The canonical tunnel of the tunnel family T{e,e',u,v), denoted by 
rmin(e, e',M,f), is the tunnel that matches the shortcut uv to the subcurve X[s,t], such 
that s and t are the values realizing 



rmm{ei,ej,u,v)= mm max ' , '/ A ' 

X(s)ee,X{t)ee', \ \\A[t)—V\ 

s<t ^ 



(2) 



We refer to T^[^{ei, Bj, u, v) as the minimum radius of this family. 



Clearly, one can compute the canonical tunnel T(e, e', u, v) in constant time. In particu- 
lar, the price of this canonical tunnel is 



prc(rmin(e,e',u,t;)) = d:r{X[s,t] ,uv) . 



(3) 



We emphasize that a shortcut is always a segment connecting two vertices of the curve 
Y, and a tunnel is always a segment in the parametric space; that is, they exist in two 
completely different domains. 



2.2.2 Gates 

Consider the edges of the grid in the parametric space of X and Y, that each correspond to 
an edge of X and a vertex of Y, say X[xi, Xj+i] and Y{yj). Let Bij denote such a grid edge 
and assume we are given a subset U of the parametric space, which is convex in every cell 
of the grid. If t/ fl Bij is non-empty, we call the point {x,yj) a gate of U, if x is either the 
minimum or maximum of the set {x \ {x, y) G f/flejj}. The set of gates of U is then the set 
of such points with respect to all possible edges e^j of the grid. Furthermore, we define the 
canonical gate of an edge e^ as the point minimizing ||X(x) — y(yj)|| for Xi < x < Xj+i. 
Note that canonical gates serve as endpoints of canonical tunnels (defined above) that span 
across columns of the parametric space. 

2.3 Curve Simplification 

We use the following simple algorithm for the simplification of the input curves. It is easy 
to verify that the curve simplified with parameter /x is in Frechet distance at most /i to the 
original curve, see |DHW10] . 

Definition 2.5. Given a polygonal curve X and a parameter /x > 0. First mark the initial 
vertex of X and set it as the current vertex. Now scan the polygonal curve from the current 
vertex until it reaches the first vertex that is in distance at least /i from the current vertex. 
Mark this vertex and set it as the current vertex. Repeat this until reaching the final vertex 
of the curve, and also mark it. We refer to the resulting curve X that connects only the 
marked vertices, in their order along X, as a ^-simplification of X and we denote it with 
simpl(X,/i). 

During the course of the algorithm we will simplify the input curves in order to reduce 
the complexity of the free space. Since the fc-shortcut Frechet distance does not satisfy the 
triangle inequality, we need the next lemma to ensure that the computed distance between 
the simplified curves approximates the distance between the original curves. 

Lemma 2.6. Given a simplification parameter fi and two polygonal curves X and Y , let 
X = simpl(X,/i) and Y = simpl(y,/i) denote their ^-simplifications, respectively. For 
any k e M, it holds that ds(A;,X,F) - 2/i < ds(/c,X,F) < ds(A;,X,r) + 2/x. Similarly, 
ds(X, y) - 2/i < ds(X,F) < ds(X, Y) + 2/i. 

Proof: The proof is straightforward and is included for the sake of completeness. 

First, we show that ds(A;,X,y) < ds(A;,X,F) + 2/i. Let F' = Fi + Fa + ■ ■ ■ + Fsfc'+i 
be a fc'-shortcut curve of Y, with k' < k, and such that dj{X,Y') < d§{k,X,Y) and let 
X = Xi + X2 + ■ ■ ■ + X2k'+i be the decomposition of X induced the reparametrization 
realizing the /c-shortcut Frechet distance between X and Y'. We have that 

• n V ■^z^ / niaxo<i<fc' d3-(X2j+i, F2J+1) , \ 

ds(fc,X,y)=max -- , 

which implies that the Frechet distance between Yi (resp. Yi) and Xj is smaller or equal to 
ds(/c,X,F) for any 1 < z < 2k' + 1. 
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Consider a one-to-one matching between X and X that reahzes the Frechet distance 
between them and consider the decomposition of X = Xi + X2 + ■ ■ ■ + X2k'+i, such that 
Xi is matched one-to-one to Xi under this matching. It holds that d3-{Xi,Xi) < fi, by 
Definition ^M _ 

Now, in a similar way, let y = y 1 + 5^2 + ■ ■ ■ + ^2fc'+i be a fc'-shortcut curve of Y, such 
that for any < i < k', the subcurve ^21+1 is matched one-to-one to Y2i+i under a matching 
that realizes the Frechet distance between Y and Y. We have that d-jiYi^Y^ < fi for any 
of the shortcuts, since their endpoints are in distance /x. It holds that 

d§(/c,X,F) < dj(X,F') = maxfmaxdgr(X2i+i,K2i+i) , maxdj(X2i,F2i) j ■ 
By the triangle inequality, we have that 

<ij{X2i+i,Y2i+i) < d:j-{X2i+i,X2i+i) + d3-(X2i+i,F2J+i) + d3-(V2j+i,'K2J+i) 

<ds(fc,x,y) + 2/i. 

Similarly, we have 

ds{X2^,Y2^) < d5(X2„ X2,) + d:,(X2„ F2,) + d ^(^2^, ^2,) < d,{k, X ,Y) + 2fl. 

This implies the second inequality in the claim. We can argue in the same way that 
ds{k,X,Y) < dg{k,X,Y) + 2/i, which implies the first inequality. ■ 

The proof of the following lemma can be found in |DHW10] . Although the definition of 
the complexity of the free space [N"<5 is slightly different here, the proof still applies. 



Lemma 2.7 ( |DHW10] ). For any two c-packed curves X andY in IR of total complexity 



n, and two parameters < £ < 1 and 5 > Q, we have that ?sf<5(simpl(X,/i) , simplCK,/^)) 
Oicnje), where fi = Q{e6). 

2.4 Building blocks for the algorithm 

The algorithm uses the following two non-trivial data-structures. 

Data-Structure 2.8. Given a polygonal curve Z with n vertices in IR , one can build a 
data- structure, in 0(ne~'^'^\og'^{l/e)\og'^ nj time, using 0(^ne~'^'^\og^ 1/e) space, such that 
given a query segment pq, and any two points u,v on the curve (and the segments of 
the curve that contain them), one can (1 + e)- approximate the distance dj(Z[u,t>], pq) in 
0(e~^log?7.1oglogn) time. See Sectionmand Theorem 6.7 



Data-Structure 2.9. For given parameters e and 6, and two c-packed curves X and Y 
in ]R , let X = simpl(X,/i) and X = simpl(y,yu), where fi = e6. One can compute all the 
vertex-edge pairs of the two simplified curves X and Y in distance at most 6 from each other. 



in time 0{n\ogn + c n/e). See Section 2.^.1 and Lemma 2.10 



10 



2.4.1 Data-structure for range reporting under low density 

Lemma 2.10. Let X and Y be two c-packed curves in ]R , let X = simpl(X,/i) and X = 
simpl{Y,fi). Then one can compute all the vertex-edge pairs of the two simplified curves X 
and Y in distance at most 6 from each other, in time 0{nlogn + c^n/e), where e = fi/6. 

Proof: Observe that X and Y have density = 0{c). Now, we build the data-structure 
of de Berg and Streppel |dBS06] for the segments of Y (with e = 1/2). For each vertex 
of X we compute all the segments of Y that are in distance at most 6 from it, using the 
data-structure |dBS06] . Each query takes 0(\ogn + k(f)) time, where k is the number of edges 



reported. Lemma 2.7 implies that the total sum of the fc's is 0{cn/e). 

We now repeat this for the other direction. Together, this implies the claim. 



2.5 Monotonicity of the Prices of Tunnels 

Lemma 2.11. Given a value 6 > and two curves Xi and X2, such that X2 is a subcurve 
of Xi, and given two line segments Yi and Y2, such that dj(Xi, Yij < 6 and the start (resp. 
end) point of X2 is in distance 5 to the start (resp. end) point 0/F2, then dj(X2,F2) < 35. 

Proof: Let u denote the subsegment of Yi that is matched to X2 under an 
optimal Frechet mapping between Xi and Yi. We know that dj(X2, u) < 6 
by this mapping. The start point of Y2 is in distance 26 to the start point 
of u, since they are both in distance 6 to the start point of X2 and the 
same holds for the end points. This implies that dj(u, Y2) < 26. Now, by 
the triangle inequality, dj(X2, Y2) < d3r(X2, u) + d3-(u, Y2) < 36. m 

Lemma 2.12. Consider two polygonal curves X and Y, three points p, q and r in their 
free space, and let 6' = max(d(p) , d(q) , d(r)). If x{p) < x(q) < x(r) then prc(r(q, r)) < 
3max((5',prc(r(p,r))),'®1 




Proof: Let Xi be the subcurve X[x(p 
be the shortcut y [|/(p),|/(r)] and let Y2 



x[r 



, and let X2 = X[x(q),x(r)]. Similarly, let Yi 
Y[y{q),y{r)]. Setting 6 = max(dj(Xi,Fi) , <5') 



the claim now immediately follows from Lemma 2.11 



The monotonicity property of Lemma |2.12 holds even if the tunnels under consideration 
are not vahd. For example if x(p) < x{r) and y{p) > y{r) then the tunnel r(p, r) is not a 
valid tunnel and it cannot be used by a valid solution. Nevertheless, r(p, r) has a well defined 
price, and these prices have the required monotonicity property of Lemma |2.12 



The following is an easy consequence of Lemma 2.12 



Lemma 2.13. // pi,...,Pm are m points in the free space, and let 6 > and i be pa- 
rameters, such that (i) x{pi) < x(p2) < • • • < x{pm), (H) d(pj) < 6, for all j , and 
(Hi) Tp = prc(r(pj, Pm))- Then, we have: 

(A) If ip > 6 then for all j > i, we have prc(r(pj, p^)) < Sip. 

(B) If ip > 36 then for all j < i, we have prc(r(pj, p^)) > V'/3- 



Here, d(p) is the elevation of p, see Eq. ill, and T(q, r) is the tunnel between q and r, see Section 2.2.1 
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tunnel(i?, p,e,5) 


1 

2 
3 


q = (xq, i/q) point in R with max value of Xq, 

such that Xq < Xp and ^q < |/p, where p = (xp,?/p). 
if 30 < 3(5 and < prc^(p, q) < (1 + £)0 then 
Return p // qp: tunnel returned 


4 
5 

6 


Compute j such that Xp G Xedge(-^, j) = [xj?Xj_|_i] 
q = (xq, ^q) point in R with min value of Xq, 

such that Xq G Xedge(-^, j)' Xq > ^^p, and ?/q < Up 
if q exists and d(q) < S then 


7 


Return v = (xq,|/p) // qv: Vertical tunnel returned 


8 


else 


9 


Return null. 



Figure 2: The tunnel procedure returns the endpoint of an affordable tunnel if it exists. 

3 Approximating the shortcut Frechet distance 

Here, we describe the approximation algorithm for the case that the number of shortcuts 
used is unbounded. 



3.1 The tunnel procedure 

A key element in the decision procedure is the tunnel procedure depicted in Figure |2} This 
procedure receives a set of gates R and a gate p as input and returns the endpoint of an 
affordable tunnel that starts at a gate of R and ends either at p or the closest point to the 
right of p in the same free space interval. More specifically, if a tunnel between a gate in R 
and the free space interval of p exists, which has price less than S, then the algorithm will 
return a tunnel of price less than or equal to (1 + e)3S. If the algorithm returns null, then we 
know that no such tunnel of price less than 6 exists. During the decision procedure, we will 
repeatedly invoke the tunnel procedure with a set of gates R, for which we already know 
that they are contained in the reachable free space 'Jl'^g{X,Y), and the left gate associated 
with a horizontal free space interval of D<5(X, F), in order to determine, if and to which 
extent this interval is reachable. 

The main idea of the tunnel procedure is the following. For a given tunnel, we can (l+e)- 
approximate its price, using a data-structure which answers these queries in polylogarithmic 



time, see Data-Structure 2^ The desired tunnel could be a vertical tunnel which starts 
at a gate of R, or a tunnel between a gate of R and p. Naively, one could test all tunnels 
that start from a gate in R and end in p, however, this takes time at least linear in the size 
of R. Since we are only interested in a constant factor approximation, it is sufficient, by 



Lemma 2.12, to test only the tunnel which corresponds to the shortest subcurve of X. The 
corresponding gates can be found in polylogarithmic time using a two dimensional range 
tree, which is built on the set R and we assume is available to us. 
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decider{X,Y,e,S) 


1 


Assert that d(0,0) = |X(0)-r(0)| < (5 and d(l, 1) < (5 


2 


Let Q be a min-priority queue for nodes v{i,j) with keys {jn + i) 


3 


Compute and enqueue the cells Cij that have non-empty I^j or I^j 


4 


Let R be an empty set 


5 


while Q 7^ do 


6 


Dequeue node v{i,j) and its copies from Q 


7 


Let p be the left gate of If • 


8 


V = tunnel(i?, p, e, 5) 


9 


Compute R^j and i?^^- from v, R'^_^j, Rl^_^, I^- and /^ 


10 


if RIj ^ then 


11 


Enqueue v{i + 1, j) and insert edge between v{i,j) and v{i + 1, j) 


12 


if i?^^^. ^ then 


13 


Enqueue f (i, j + 1) and insert edge between v{i,j) and v{i,j + 1) 


14 


Add gates of R^j to R 


15 


if (1, 1) G i? then 


16 


Return "ds(X,r) < (1 + £)35" 


17 


else 


18 


Return "ds(X,r) > 5" 



Figure 3: The decision procedure decider for the shortcut Frechet distance. 



3.2 The decision algorithm 

In the decision problem we want to know whether the shortcut Frechet distance between 
two curves, X and Y, is smaller or equal a given value 6. The free space diagram 2)<5(X, Y) 
may consists of a certain number of disconnected components and our task is it to find a 
monotone path from (0,0) to (1, 1), that traverses these components while using shortcuts 
between vertices of Y to "bridge" between points in different components or where there 
is no monotone path connecting them (see Figure [1]) . The decision algorithm exploits the 
monotonicity of the tunnel prices shown in Lemma 2.12 and is based on a breadth first search 
in the free space diagram (a similar idea was used in |DHW10] , but here the details are more 
involved) . 

Given two curves X and Y, and parameters 6 and e the algorithm will output an answer 
equivalent to "yes" if there exists a shortcut curve Y' of Y, such that ds(X, Y') < 6 and an 
answer equivalent "no" if there exists no shortcut curve such that ds(X, F') < (1 + e)36] 
otherwise, the algorithm may output either of the two answers. 



3.2.1 Detailed description of the decision procedure 

The decision algorithm is depicted in Figure |4j The algorithm uses a directed graph G that 
has a node v{i,j) for every free space cell Cij whose boundary has a non-empty intersection 
with the free space D<5(X, Y). 
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Becider{X, Y,e,S) 

1: Let e' = e/10 

2: Compute X = simpl(X,/i) and Y = simpl(F,/i) with /i = 

3: Call decider(X, Y, e', S') with S' = (l + 2e')S 

4: Return either "ds(X,F) < (1 + e)3(5" or "ds(X,y) > 5" 


--e'S 



Figure 4: The resulting decision procedure Decider. A detailed description of the complete 



algorithm is given in Section 3.2.1 






R'r, 



-R,°- 



V 



Recall that these intersections are defined as the free space intervals 
I^^j, -^j-i,j and lij-i (see figure on the right. For any path along 
the edges of the graph G from f (1, 1) to v{i,j), there exists a monotone 
path that traverses the corresponding cells of 2)<5(X, F) while using ' 
zero or more affordable tunnels. A node v{i,j) can have an incoming 
edge from another node v{i' ,j'), ii i' < i and j' < j and either v{i' ,j') ^""' 

is a neighboring node, or the two cells can be connected by an affordable tunnel which starts 
at the lower boundary of v{i',j') and ends at the upper boundary of v{i,j). The idea of the 
algorithm is to propagate reachability intervals R^j C I^j and i?^ ■ C if,- while traversing a 
sufficiently large subgraph starting from t'(l, 1), and computing the necessary parts of this 
subgraph on the fiy. We store these intervals with the cell v{i,j) that has them on the top 
(resp. right) boundary. The reachability intervals R^j being computed satisfy 



JiZ{x,Y)rMl^cRi^cx 



oo 
<(l+e)3(5 



(x, Y) n J^ 



(4) 



and an analogous statement applies to R^j. The aim is it to determine if either (1,1) G 
3^<(i+e)35(-^5 ^) or (1)1) ^ '^'<si.X,Y). Throughout the whole algorithm we also maintain 
a set of gates i?, which represents the endpoints of the horizontal reachability intervals 
computed so far. 

We will traverse the graph by handling the nodes in a row-by-row order, thereby handling 
any node v{i,j) only after we handled the nodes v{i',j'), where j' < j, i' < i and (z' + j') < 
{i +j). To this end we keep the nodes in a min-priority queue where the node v{i,j) has the 
key {jn + i). The correctness of the computed reachability intervals will follow by induction 
on the order of these keys. Furthermore, it will ensure that we handle each node at most 
once and that we traverse at most three of the incoming edges to each node of the graph. 

The queue is initialized with the entire node set at once. To compute this initial node set 
and the corresponding free space intervals we use Data-Structure 2.9 The algorithm then 



proceeds by handling nodes in the order of extraction from this queue. When dequeuing 
nodes from the queue, the same node might appear three times (consecutively) in this queue. 
Once from each of its direct neighbors in the grid and once from the initial enqueuing. 

In every iteration, the algorithm dequeues the one or more copies of the same node v{i,j) 
and merges them into one node if necessary. Assume that v{i,j) has an incoming edge that 
corresponds to an affordable tunnel. Let p be the left gate of I^j. We invoke tunnel(i?, p, e, 6) 
to test if this is the case. If the call returns null, then there is no such affordable tunnel. 
Otherwise, we know that the returned point v is contained in Rfj. If there were more than 
one copies of this node in the queue, we also access the reachability intervals of the one or 
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two neighboring vertices (i.e., -R^_i , and R^j_^. Using the reachabihty information from 
the at most three incoming edges obtained this way, we can determine if the cells Cjj+i and 
Cj+ij are reachable, by computing the resulting reachability intervals R^^ at the top side 
and i?*" the right side of the cell Cij. Since the free space within a cell is convex and of 
constant complexity, this can be done in constant time. 

Now, if R'lj 7^ we create a node f (i, j + 1), connect it to v{i,j) by an edge, we enqueue 
it, and add the gates of R^j to R. If R^j 7^ we create a node v{i + l,j), connect it to 
v{i,j) by an edge, and we enqueue it. If we discover that the top-right corner of the free 
space diagram is reachable this way, we output the equivalent to "yes" and the algorithm 
terminates. In this case we must have added (1, 1) as a gate to R. The algorithm may also 
terminate before this happens if there are no more nodes in the queue, in this case we output 
that no suitable shortcut curve exists. 

3.3 The main algorithm 

The given input is two curves X and Y. We want to use the approximate decision procedure 
Decider, described above, in a binary search like fashion to compute the shortcut Frechet 
distance. Conceptually, one can think of the decider as being exact. In particular, the 
algorithm would, for a given value of 6, call the decision procedure twice with parameters 6 
and 6' = 6/4 (using e = 1/3). If the two calls agree, then we can make an exact decision, 
if the two calls disagree, then we can output a 0(l)-approximation of the shortcut Frechet 
distance. 

The challenge is how to choose the right subset of candidate values to guide this binary 
search. Some of the techniques used for this search have been introduced in previous papers. 
In particular, this holds for the search over vertex-vertex, vertex-edge and monotonicity 



events which we describe as preliminary computations in Section 3.3.1 This stage of the 
algorithm eliminates the candidate values that also need to be considered for the approxi- 
mation of the standard Frechet distance and it is almost identical to the algorithm presented 
in [DHWinj . 

As mentioned before, a monotone path could also become usable by taking a tunnel. 
There are two types of events associated with a tunnel family: The first time such that any 
tunnel in this family is feasible, which is the creation radius. Fortunately, the creation 
radii of all tunnels is approximated by the set of vertex- vertex and vertex-edge event radii, 
and our first stage search would thus take care of such events. 

The other events we have to worry about are the first time that the feasible family of 
tunnels becomes usable via a tunnel (i.e., the price of some tunnel in this family drops 
below the distance threshold 6). Luckily, it turns out that it is sufficient to search over the 
prices of the natural tunnel associated with such a family. The price of a specific tunnel 



can be approximated quickly using Data-Structure 2.8 However, there are Q{n'^) tunnel 
families, and potentially the algorithm has to consider all of them. Fortunately, because of 
c-packedness, only O^n^) of these events are relevant. A further reduction in running time 
is achieved by using a certain monotonicity property of the prices of these tunnels and our 
ability to represent them implicitly to search over them efficiently. 
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3.3.1 The algorithm — First stage 

We are given two c-packed polygonal curves X and Y with total complexity n. We repeatedly 
compute sets of event values and perform binary searches on these values as follows. 

We compute the set of vertices V of the two curves, and using well-separated pairs 
decomposition, we compute, in 0{n\ogn) time, a set U of 0{n) distances that, up to a 
factor of two, represents any distance between any two vertices of V . Next, we use Decider 
(with fixed e = 1/3) to perform a binary search for the atomic interval in U that contains 
the desired distance. Let [a, (3] denote this interval. If 10a > (3/10 then we are done, since 
we found a constant size interval that contains the Frechet distance. Otherwise, we use the 
decision procedure to verify that the desired radius is not in the range [a, 10a] and [/3/10, /5]. 
For a' = 3a, (3' = (3/3, let X' = [a', (3'] denote the obtained interval. 

We now continue the search using only decider and the simplified curves X = simpl(X, fi) 
and Y = simpl(K, /i), where /i = a'. We extract the vertex-edge events of X and Y that are 
smaller that (3', see Appendix lAl To this end, we compute all edges of X that are in distance 



at most (3' of any vertex of Y and vice versa using Data-Structure 2^ Let U' be the set of 
resulting distances. We perform a binary search, using decider to find the atomic interval 
X" = [a", (3"] of U' n X' that contains the shortcut Frechet distance between X and Y. 

Finally, we again search the margins of this interval, so that either we found the desired 
approximation, or alternatively we output the interval [10a", /3"/10], 

3.3.2 Second Stage — Searching over tunnel prices 

It remains to search over the canonical prices of tunnel families T(e, e', u, v), where e ^ eW\ 
We have an interval [a,/3] = [10a", /3"/10], and simplified curves X and Y of which the 



shortcut Frechet distance is contained in [a, (3] and approximates d§{X,Y). By Lemma 2.7 
the number of vertex-edge pairs in distance /3 is bounded by 0{n) and this set contains the 
canonical gates which are feasible for any value in [a,/3]. Let P denote the m = 0{n) points 
in the parametric space that correspond to the canonical gates of these vertex-edge pairs; 
that is, for every feasible pair p (a vertex of Y) and e (an edge of X), we compute the closest 
point q on e to p, and place the point corresponding to (q, p) in the free space into P. 

It is sufficient to consider the tunnel families between these vertex-edge pairs, since all 
other families are not feasible in the remaining search interval. Thus, if we did not care 
about the running time, we could compute and search over the prices of the tunnels P x P, 



using Data-Structure 2.8, Since this is unacceptably slow, we use more involved implicit 



representation of these tunnels to carry out this task. 

Implicit search over tunnel prices. Consider the implicit matrix of tunnel prices M = 
P X P where the entry M{i,j) is a (1 + £:)-approximation to the price of the canonical tunnel 



r(pj, pj). By Lemma 2.13, the first j values of the jth row of this matrix are monotonically 
decreasing up to a constant factor, since they correspond to tunnels that share the same 
endpoint Pj and are ordered by their starting points pj (we ignore the values in this matrix 



above the diagonal). Using Data-Structure 2.8 we can (1 + £:)-approximate a value in the 



matrix in polylogarithmic time per entry. Similarly, the lower triangle of this matrix is sorted 



^Since for the case where e = e' the canonical price coincides with the creation event value. 
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in increasing order in each column. As such, this matrix is sorted in both rows and columns 
and one can apply the algorithm of Frederickson and Johnson |FJ84] to find the desired value. 
This requires O(logm) calls to Decider, the evaluation of 0{m) entries in the matrix, and 
takes 0{m) time otherwise. Here, we are using Decider as an exact decision procedure. 
The algorithm will terminate this search with the desired constant factor approximation to 
the shortcut Frechet distance. 

4 Analysis 

4.1 Analysis of the tunnel procedure 

Lemma 4.1. Given the left gate p of a free space interval If- and a set of gates R, and 
parameters < e < 1 and 6 > 0, the algorithm tunnel depicted in Figure^ outputs one of 
the following: 
(i) A point V e I^^ such that there exists a tunnel T(q, v) of price prc^(q, v) < (1 + e)?>5 

from a gate q E R, or 
(a) null, in this case, there exists no tunnel of price less than or equal to 6 between a gate 
of R and a point in I^- . 
Furthermore, in case (i), there exists no other point r G [p,v] that is the endpoint of a tunnel 
from R with price less than or equal to 6. 

Proof: The correctness of this procedure follows from the monotonicity of the tunnel prices. 



which is testified by Lemma 2.12 



Let (f) be the (1 + £:)-approximation to the price of the 
tunnel, that we compute in Line 2j This tunnel starts at a point in R and e nds in p and 



it corresponds to the shortest subcurve X oi X over any such tunnel. Lemma 2.12 implies 
that if < 3(5 then there can be no other tunnel of price less than 6, which corresponds to 
a subcurve of X that contains X. Therefore, the price of any tunnel from a point q G i?, 
which lies in the lower left quadrant of p, to a point that lies in the upper right quadrant of 
p has a price larger than S. In particular, this holds for those tunnels that end to the right 
of p in the same free space interval. The only other possibility for a tunnel from R to If is 
a vertical tunnel that lies to the right of p. Observe that a vertical tunnel which is feasible 
for 6 always has price at most 6, since it corresponds to a subcurve of X that is equal to 
a point which is in distance 6 to the shortcut edge. In Line |4] and Line [5] we compute the 
leftmost gate of R in the lower right quadrant of p which lies in the same column as p. If 
there exists such a point with a vertical tunnel that ends in the free space interval Jf ■, then 
we return the endpoint of this tunnel. Otherwise we can safely output the equivalent to the 
answer that there exists no tunnel of price less than S. m 

4.2 Analysis of the decision procedure 

Clearly, the priority queue operations take time in O(A^logA^) and space in 0{N), where 
A^ = [Nf<5(X, Y) is the size of the node set, which corresponds to the complexity of the free 
space diagram. We invoke the tunnel procedure once for each node. Since we add at most 
a constant number of gates for every cell to R, the size of this set is also bounded by 0{N). 
Therefore, after the initialization the algorithm takes time near linear in the complexity of the 
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free space diagram. We can reduce this complexity by first simplifying the input curves with 
fi = Q{e6) before invoking the decider procedure, thereby paying another approximation 
factor. We denote the resulting wrapper algorithm with Decider, it is depicted in Figure |4| 
Now, the initial computation of the nodes takes near-linear time by Data-Structure |2.9| and 
therefore the overall running time is near linear. A more detailed analysis of the running 
time can be found in the following. 

Lemma 4.2. Given parameters 6 > and < £ < 1 and two c-packed polygonal curves X 
and Y of total complexity n. The algorithm Decider depicted in Figure \^ outputs one of 
the following: (i) "ds(X,F) < (1 + e)?>5", or (ii) "ds(X,y) > 5". In any case, the output 
returned is correct. The running time is 0{Cn\o^ n), where C = c^e~'^'^\og{l/e). 

Proof: The algorithm Decider computes the simplified curves X = simpl{X , fi) and Y = 
simpl(K, /i) with yU = Q{£5), before invoking the algorithm decider described in Figure [slon 



these curves. By the correctness of the tunnel procedure (i.e.. Lemma 4.1), one can argue 



by induction that the subsets of points of 3?^^(X, Y) intersecting a grid edge are sufficiently 



approximated by the reachable intervals computed by decider (see Eq. (El)). By Lemma 2.6 
this approximates the decision with respect to the original curves sufficiently. 



It remains to analyze the running time. By Lemma 2/7, the size of the node set of the 
graph G is bounded by A^ = 0{cn/e). This also bounds the size of the point set R and the 
number of calls to the tunnel procedure, as those are at most a constant number per node. 
During the tunnel procedure, which is depicted in Figure [2| we 

(A) approximate the price of one tunnel in Line pi and 

(B) invoke two orthogonal range queries on the set R in Line [1] and Line 15} 

As for (A), building the data-structure that supports this kind of queries takes Ti = 



0(ne~^'' log^(l/e) log^ n) time by Data-Structure 2.8 Since we perform 0{N) such queries, 
this takes T2 = 0(A^£:~^ lognloglogn) = O {cne~'^ log n log log n) time overall. As for (B), 
again, the set of gates i? is a finite set of two dimensional points and we can use two dimen- 
sional range trees (with fractional cascading) to support the orthogonal range queries. We 
want to build this tree by adding 0{N) points throughout the algorithm execution. Since the 
range tree is a static data-structure, we have to make it dynamic, but we only need to sup- 
port insertions, and no deletions. This can be easily done by using the logarithmic method 
if we allow an additional logarithmic factor to the running time, see also |BS80l IUve83j . 
In this method, the point set is distributed over O(logA^) static range trees, which need 
to be queried independently and which are repeatedly rebuilt throughout the algorithm. 
Overall, maintaining this data-structure and answering the orthogonal range queries takes 
T^ = 0{N\og^N) time. 

During the algorithm, we maintain a priority queue, where each node is added and ex- 
tracted at most three times. As such, the priority queue operations take time in 0(A^log A^). 



The initial computation of the node set takes T4 = 0{n\ogn + c^n / e) by Data-Structure 2.9 
As such, the overall running time is Ti + T2 + T3 + T4, which is 

0{ne~'^ log (l/e) log n + cne"'^ log n log log n + era log n + n\ogn + (?n/e) 
= 0(Cnlog^n), 

where C = c^£:~^'^log(l/£). 1 



Observation 4.3. It is easy to modify the decider algorithm such that it also outputs the 
respective shortcut curve and reparametrization which satisfies the Frechet distance. We 
would modify the tunnel procedure such that it returns not only the endpoint, but also the 
starting point of the computed tunnel. During the algorithm, we then insert an edge for 
each computed tunnel, thereby creating at most three incoming edges to each node. After the 
algorithm terminates, we can trace any path backwards from (1,1) to (0,0) in the subgraph 
computed this way. This path encodes the shortcut curves as well as the reparametrizations. 

4.3 Analysis — understanding tunnel events 

The main algorithm use the procedure Decider to perform a binary search for the minimum 
6 for which the decision procedure returns "yes" . In the problem at hand we are allowed to 
use tunnels to traverse the free space diagram, and it is possible that a path becomes feasible 
by introducing a tunnel. The algorithm has to consider this new type of critical events. 

4.3.1 The canonical price of a tunnel family 

Consider the first time (i.e., the minimal value of 6) that a decision procedure would try to 
use a tunnel of a certain family. 

Definition 4.4. Given a tunnel family T{ei,ej,u,v), we call the minimal value of 6 such 
that T<s{Bi,ej,u,v) is non-empty the creation radius of the tunnel family and we denote 
it with rcrt(ej, ej, M, f). (Note, that the price of a tunnel might be considerably larger than 
its creation radius.) 



Lemma 4.5. The creation radius icrti^i, ^j,u,v) = Tniin{Bi,Bj,u,v), see Definition 2.4' 



Proof: Recall that the creation radius of the tunnel family is the minimal value of S such 
that any tunnel in this family is feasible. Let u' = nn{u, e^) and v' = nn(w, Cj). If u' appears 
before w' on X, then the canonical tunnel is realized by X{xq) = u' and X(xq) = v' and the 
claim holds. In particular, this is the case if z < j. 

Now, the only remaining possibility is that u' appears after v' on e. It must be that i = j, 
therefore let e = ej = e^. Observe that in this case any tunnel in the family which is feasible 
for 6 also has a price that is smaller or equal to 6. Consider the point r realizing the quantity 

minmax(||r — m|| , ||r — f ||). 

ree 

Note that r is the subcurve of X corresponding to the (vertical) canonical tunnel in this case. 
We claim that for any subsegment uv ^ e (agreeing with the orientation of e) we have that 
dj^luv, uv) > d3^(r, uv). li u = v then the claim trivially holds. 

Assume that v' appears after u along e (the case depicted in Figure [s]). Since u' appears 
after v' along e, we have that ||f' — m|| < ||m — m||, as moving away from u' only increases the 
distance from u. Therefore, 

d3r(r, Mt>) < d3:{v' ,uv) = max(||f' — m|| , \\v' — v\\) < max(||M — u\\ , ||^ — v\\) = dj(m?, -uf) . 



19 




*°t; 




Figure 5: Two cases: v' appears either before or after u along e, assuming that u' appears 
after v' on e. 

Otherwise, if v' appears before u along e, as depicted in Figure [S] on the right, then 

dj(r, Mt>) < d'jiu.uv) = max(||M — u\\ , ||m — v\\) < max(||M — u\\ , \\v — v\\) = d3r(uv,uv) , 

since moving away from v' only increases the distance from v. 

This implies that the minimum 6 for a tunnel in T{ei,ei,u,v) to be feasible is at least 
di3^{r, uv) = Tcrti^i, ^i, u, v) . And r testifies that there is a tunnel in this family that is feasible 
for this value. ■ 

The following lemma describes the behavior when 6 rises above a tunnel price, such that 
the area in the free space that lies beyond this tunnel potentially becomes reachable by using 
this tunnel. More specifically, it implies that the first time (i.e., the minimal value of 6) that 
any tunnel of a family T(ei, e^, u, v) is usable (i.e. its price is less than 6), any tunnel in the 
feasible set T<s{Bi,ej,u,v) associated with this family will be usable. 

Lemma 4.6. Given a value 6 > 0, we have for any tunnel T(f, g) in the feasible subset of a 
given tunnel family T<5{ei,ej,u,v), that 

(i) ifS< prc(rmin(ej,ej,M,v)), then prc^(f,g) = prc(rmin(ei,ej,M,f)), 

(a) otherwise, prc^(f,g) < 6. 

Proof: We first handle the case that i j^ j- Let Cj = PiPi+i and e^ = PjPj+i. 

Let p G ej and q G e^ be some points on these p 
edges, that corresponds to f and g, respectively. 
Observe that since this is a feasible tunnel in this 
family, we have that 

max(||p — u\\ , ||q — v\\) < 5. 

Consider the optimal Frechet matching of X{p, q) with uv, and let Uopt and fopt be the 
points on uv that are matched to pj+i and p^ by this optimal Frechet matching. Let a = 
d3^(X(pj+i, pj) ,UaVa), whcrc UaVa is the subsegment of uv minimizing a. 
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We have, by Observation A.l, that 



d3-(pPi+i,UUopt), 

dj{X{p, q) , uv) = max ( ds{X{pi+i, pj) , Wopt^^opt) , 

d3r{pjq,Voptv) 

\\p-u\\, 
max I dy(X(pi+i, p^) , Wopt^^opt) , 

llq-^^ll 



max 



/ Up -^IMIPi+i -^optll , \ 

dg-(X(pi+i, Pj) , Moptt^opt) , 

11 Pi ~ "^optil , 
||q — f II 



llp-wlh 

> max I dy(X(pi+i, pj) , UaVc 
||q — f II 



max 



d3-(ppi+i, UUa) , dj(X(pi+i, Pj) , UaVa) 



v\\), 



n\in\^ii ^7; "•) ^ ) ) 



(is{Pjq,Viv) 
> dj(X(p,q),Mw). 

For a = dj(X(pj+i, Pj) ,UaVa), this imphes dj(X(p, q) ,uv) = max(||p — u\\ , a, ||q 
where a < max(a, 6) is equal for all tunnels in the family. Now, if 5 < prc(rinm(ej, e^, 
then we have prc(rmin(ei, e^, u,v)) = a > 6 and 

prc^(f,g) = dj(X(p,q) ,uv) = max(||p - m|| , «, ||q - t;||) < max{a,6) = a. 

This proves (i). Otherwise, we have prc(rinin(ei, ej,M,f)) < 6. Which implies that a < S, 
but then prc^(f,g) < 6, implying (ii). 

If 2 = j then the Frechet distance is between the shortcut segment and a subsegment of 
Cj. But this distance is the maximum distance between the corresponding endpoints, by Ob- 



servation A.l As the distance between endpoints of shortcuts and subcurves corresponding 



to tunnels of T<s{Bi, Bj,u, v) is at most 6, and by Lemma 4.5 the claim follows. 



Definition 4.7. For a specific tunnel family T(ej, Cj, u, v), we call the price of its canonical 
tunnel T^i^{ei,ej,u,v) the canonical price of this tunnel family. 



Lemma 4.8 below implies that the set of creation radii of all tunnels is approximated 
by the set of vertex- vertex and vertex-edge event radii. A similar lemma was shown in 
[DHWIO] . to prove this property for the monotonicity event values. Therefore, the algorithm 
eliminates these types of events in the first stage, in addition to eliminating the vertex-vertex 
and vertex-edge events. 

Lemma 4.8. Consider an edge e = pq of a curve X, and two vertices u and v of a curve 
Y . We have that x/2 < rcrt(e, e,M,f) < 2x, where x is in the set {d(M,e) , d(f ,e) , Hm — f ||}. 

Proof: First, observe that rcrt(e, e, m, v) > \\u — v\\ /2, as it is the maximum distance of some 
point on e from both u and v. In particular, if rcrt(e, e, u,v) < 2 \\u — v\\ then we are done. 
As such, it must be that Tcrt{B,B,u,v) > 2 ||m — 1>||. Assume that u is closer to e than v, 
and let u' be the closest point on e to u. By the triangle inequality, the distance of v from u' 



\u — u 



\u — u 



+ If 



Observe that, Yrrt(B,e,u,v) > 



) -^crtl 



\u — u 



is in the range X 

and rcrt(e, e, u, f) < max(||M — m'|| , ||v — m'||). As such, rcrt(e, e, u, f) G X. Note that if 

then we are done, as this implies that Tcrt{^,^,u,v) is in the range 



u — u \\ 
\\u — v\ 



< 

,2 



\\U — V 

u — v\\ 



Otherwise, rcrt(e,e,M,f) is in the range 

either case, the claim follows. 

The case that v is closer to e than u follows by symmetry. 



\u — u 



, 2 IIm — u'\ 



In 



21 



4.4 Analysis of the main algorithm 

The following lemma can be obtained using similar arguments as in the analysis of the main 
algorithm in [DHWIO] . We provide a simplified proof for the case here, where we are only 
interested in a constant factor approximation. 

Lemma 4.9. Given two c-packed polygonal curves X and Y with total complexity n. The 



first stage of the algorithm (see Section 3.3.1) outputs one of the following: 



(A) a 0(1) -approximation to the shortcut Frechet distance between X andY; 

(B) an interval X, and curves X and Y with the following properties: 

(i) ds(X, Y) is contained in X and ds(X, Y) /3 < ds(X,F) < 3ds(X, Y), 
(a) X contains no vertex-edge, vertex-vertex, or monotonicity event values and no 



tunnel creation radii (as defined in Section 4-5) of X and Y. 
The running time is O [c^n log^ n) . 

Proof: We first prove the correctness of the algorithm as stated in the claim. The set U 
approximates the vertex-vertex distances of the vertices of X and K up to a factor of two. 
Therefore, the interval X = [a,/3], which we obtain from the first binary search, contains 
no vertex- vertex distance of X that are more than a factor of two away from its boundary. 
This implies, that the simplification X = simpl(X, /i) results in the same curve for any 
/i G [3a, /3/3]. An analogous statement holds for Y. Unless, a constant factor approximation 
is found either in the interval [a, 10a] or the interval [/3/10,/3], the algorithm continues the 
search using the procedure decider and the curves simplified with /i = 3a. 

It is now sufficient to search for a constant factor approximation to d§{X,Y) in the 
interval X' = [3a, P/3], since this will approximate the desired Frechet distance by a constant 
factor. Indeed, by the result of the initial searches, we have that 3fj, < 10a < d§{X,Y). 



Lemma \H\ imply that d§{X,Y) < ds{X,Y) + 2/i < 3ds{X,Y) . On the other hand, the 
same lemma implies that ds{X,Y) > d${X,Y) — 2/i > dg{X,Y) /3. This implies, that 
dg{X,Y) e X' = [3a, 13/3], since dsiX,Y) G [10a, 13/10]. Note that this also proves the 
correctness of (z), since the returned interval is contained in X'. 

Observe that the set of vertex-vertex distances of X and Y is contained in the set of 
vertex-vertex distances of X and Y. Clearly, X' cannot contain any vertex-vertex distances 
of X and Y. The algorithm therefore extracts the remaining vertex-edge events U' from the 
free space diagram and performs a binary search on them. We obtain the atomic interval 
X" = [a", (3"], which contains no vertex-edge events of X and Y. Note that by Eq. ^ and 



Lemma 4.5, the monotonicity event values, as described in Appendix Kl coincide with the 



values of 6 where a tunnel within a column of the parametric space becomes feasible, that is. 



with the quantity rcrt(e, e, u, v). By Lemma 4.8, these event values would have to lie within a 



factor two of the boundaries of the interval X". Therefore, we again search the margins of this 
interval, so that either we found the desired approximation, or alternatively, it must be in the 
interval X'" = [10a", (3" /lO], which now contains no vertex- vertex, vertex-edge, monotonicity 
or tunnel creation events of X and Y. Since X'" is the interval that the algorithm returns, 
unless it finds a constant factor approximation to the desired Frechet distance, the above 
argumentation implies (i) and {ii). 

As for the running time, computing the set U using well-separated pairs decomposition 
can be done in 0{n logn), see [DHWIOJ . Computing the set U' takes time in 0{n logn-|-c^n), 
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by Data-Structure 2.9 with /i = /3/3 and S = p. The algorithm invokes the decision procedure 



O(logra) times, and this dominates the overall running time, see Lemma 



4.2 



Lemma 4.10. Given two c-packed polygonal curves X andY of total complexity n. One can 
compute a constant factor approximation to A§,{X^Y). The running time is 0{c^n\og n) . 

Proof: First, the algorithm performs the preliminary computations as described in Sec- 



tion 3.3.1 By Lemma |4.9[ we either find a constant factor approximation, or we obtain 
an interval [a,/9] and simplified curves X and Y. Furthermore, the interval [a, /3] does not 
contain any vertex- vertex, vertex-edge, monotonicity, or tunnel creation events of X and Y . 
Let P be the canonical gates that are feasible in the /3-free space of X and Y . We have that 



m = |P| = 0{n) and we can compute them using Data-Structure 2.9 in 0(n logn + c^n) 
time, for e = 1/3. Thus, the running time up to this stage is bounded by O (c^n log'^ n) , by 
Lemma 14.91 



Now, we invoke the algorithm described in Section [3. 3. 2| on the matrix of implicit tunnel 
prices defined by P and return the output as our solution. 

Consider a monotone path in the parametric space that corresponds to the optimal 
solution. If the price of this path is determined by either a vertex-vertex, a vertex-edge or a 
monotonicity event then we have taken care of these values in the first stage of the search 
algorithm. If it is dominated by a tunnel price and this tunnel has both endpoints in the 



same column of the free space, then by Lemma 4.8 the price of such a tunnel is outside the 



interval [a,/9], since by Lemma 4.9 these critical values were eliminated in the first stage. 
Otherwise, this critical tunnel has to be between two columns. Let 6 be the price of this 
tunnel (which is also the price of the whole solution.) 

Consider what happens to this path if we slightly decrease S. Since 6 is optimal, then it 
either ceases to be feasible or the price of the path does not change. 

If the critical tunnel is no longer feasible, then one of its endpoints is also an endpoint 
of the free space interval it lies on. Consider the modified path in the free space, which uses 
the new endpoint of the free space interval. If the free space interval is empty, then this 
corresponds to a vertex edge event, and this is not possible inside the interval [a, (3]. The 
other possibility is that the path is no longer monotone. However, this corresponds to a 



monotonicity event, which again we already handled because of Lemma 4.9 



If the tunnel is still feasible, then it must be that the endpoints of this tunnel are contained 



in the interior of the free space interval and not on its boundary. Now Lemma 4.6 (i) implies 
that the price of this tunnel is equal to the price of the canonical tunnel. As such, the price 
of the optimal solution is being approximated correctly in this case. 

Observe that in the second stage we are searching over all tunnel events that lie in the 
remaining search interval (whether they are relevant in our case or not). As such, the search 
would find the correct critical value, as it is one of the values considered in the search. 

The running time of second stage is bounded by: 

(A) 0{n log n log log n) time to compute needed entries in the matrix, using Data-Structure 2.1 

(B) 0((c^nlog rij logn) time for the O(logn) calls to Decider. 

(C) 0{n) for other computations. 

As such, the overall running time of the algorithm is O(c^nlog^n). ■ 
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4.5 Result 

The following theorem states the main result for approximating the shortcut Frechet distance. 

Theorem 4.11. Given two c-packed polygonal curves X andY , with total complexity n, and 
a parameter e > 0, the algorithm of Sectionl^ computes {3 + e)- approximation to the shortcut 
Frechet distance between X and Y in 0(c^n log^n) time. The algorithm also outputs the 
shortcut curve of Y and the reparametrizations that realize the respective shortcut Frechet 
distance. 



Proof: The result follows from Lemma [4.10[ We can turn any constant factor approximation 
into a (3 + £:)-approximation, using Decider with e' = e/3 in a binary search over a constant 
number of subintervals [a, /3] where /3 = (3 + 6)a. It is easy to modify the algorithm, such 
that it also outputs the shortcut curve and the reparametrizations realizing the approximate 



Frechet distance, see Observation 4.3 



5 Approximation algorithm for the /c-shortcut Frechet 
distance 

In this section, we describe an algorithm that can be used if the number of shortcuts desired 
is bounded by a prespecified integer k. The running time of this algorithm is also near-linear 
in n and has linear dependency on k. 

The main algorithm is identical to the algorithm used in the unbounded case (see Sec- 
tion 3.3), except that it uses fc-Decider (Figure ItI) instead of Decider (Figure 111). As such. 



we only describe and analyze the decision procedure. 

5.1 Basic tools 

Lemma 5.1. Given a finite set of points P in the 5-free space diagram of two polygonal 
curves X and Y of total complexity n (such that no cell in the free space contains more than 
0(1) points ofP), one can compute !3i<5(P) in time 0(|P| +X<5(X,F)). 

Proof: For each point of P we know the cell of the free space that contains it. So, consider 
the subset of P contained in a particular cell Qj. Out of this subset, we only need to consider 
the leftmost and lowermost point, that is contained in D<5(X, F), for the computation of 
the reachability intervals at the top and right boundary of this cell. The other points have no 
effect on the outcome. Recall that the complexity of the free space inside a cell is constant. 
We know for each p G P the cell Cj j that contains it on the inside or has it on its lower or 
left boundary. We can in linear time filter out the irrelevant points. (As such, assume that 
P contains only relevant points.) 

We sort the points P by the indices of the cells that contain them (i.e., a point associated 
with Cij appears before a point associated with Cj/j/, if j < j' or j = j' and i < i'). This 
sorting can be done in linear time using radix-sort. Now, we deploy the BFS approach, as 



used in Section 3.2.1 to compute the reachable free space. The BFS would use two queues. 



one for the currently visited cells, and the precomputed one (i.e., the ordering computed by 
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A;-decider(fc, X, Y, e, 5) 


1: Assert that d(0,0) = |X(0)-r(0)| < (5 and d(l, 1) <5 


2 


Let Lq be the set of gates of !D<5(X, Y) 


3 


Let So = {(0,0)} 


4 


for i = 1, . . . , /c do 


5 


Compute 'Ji<s{Si-i) 


6 


Let Lj = Li_i \ "R^siSi-i) 


7 


Let -Rj be the set of gates of 'Jl<,s{Si^i) 


8 


for p= (a;p,|/p) G L; do 


9 


V = tunnel(i?j, p, e, 5) 


10 


if V 7^ null then 


11 


Add V to Si 


12 


if (1, 1) is contained in 'Jl<5{So) U • ■ ■ U 'Jl<5{Sk) then 


13 


Return "ds(A;,X,F) < (1 + e)3(5" 


14 


else 


15 


Return "ds(fc,X,r) > 5" 



Figure 6: The decision procedure A;-decider for the fc-shortcut Frechet distance. 



A;-Decider(A;, X, F , e, 5) 

1: Let e' = e/10 

2: Compute X = simpl(X,yu) and Y = simpl(y,yu) with fi = e'5 

3: fc-decider(A;, X, F, e', 5') with 5' = (1 + 2e')6 

4: Return either "ds(fc,X,y) < (1 + 5)3(5" or "ds(A;,X,y) > 5" 



Figure 7: The resuhing decision procedure A;-Decider. A detailed description of the com- 



plete algorithm is given in Section 5.2 



the radix sort), to figure out which cells needs to be explored. At each iteration, visiting the 
minimum cell in the two queues. ■ 



5.2 The decision algorithm 

We now describe an approximate decision procedure for the fc-shortcut Frechet distance, 
where fc is a prespecified number of shortcuts allowed. As in the previous algorithm, we will 
use the tunnel procedure described in Section 3A_ The new decision procedure /c-decider 
is depicted in Figure [6j The idea of this algorithm is to incrementally approximate the i- 
reachable free space, for i < k. In each step of the iteration we compute the free space, that 
is locally reachable from the endpoints of the tunnels that were computed in the previous 
iteration (Line^. This can be done efficiently using Lemma 5.1 We extract the set of gates 
Ri from this reachable free space. Let Li denote the set of gates in 2)<5(X, F) that have 
not been reached so far. Now we would like to connect from gates in Ri to gates in Li via 
affordable tunnels using the tunnel procedure which is depicted in Figure |2] and which we 
described and analyzed already. By Lemma |4.1[ this procedure returns a tunnel of price 
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at most (1 + e)3S, given that there exists such a tunnel of price at most S. The quadrant 
queries are performed on a static two dimensional range tree, which we build on the set of 
discovered gates Ri for each iteration. The initial set of "undiscovered" gates Lq can be 



computed using Data-Structure 2.9 If, after k steps of this algorithm, the point (1, 1) has 
not been reached, then we know that di§{k, X, Y) > 6. Otherwise, we know that there exists 
a fc'-shortcut curve Y' for < k' < k, such that dg-(X, y ) < (1 + e)36. As in the previous 
algorithm, we simplify the curves with fi = Q{e6) before invoking the fc-decider procedure 
in order to ensure that the complexity of the free space diagram is near-linear. The resulting 
wrapper algorithm is called A;-Decider and depicted in Figure [7} 

5.3 Analysis 

Here, we analyze the decision algorithm. The analysis of the main algorithm is presented in 
Section 14. 4[ 

Lemma 5.2. For every r = (xr,yr) G '^<s{Si), as computed by A:-decider (depicted in 
Figure[^, it holds that r G 'Jll.,-^^^^^^g{X,Y). 

Proof: We prove that for any such r, it holds that ds(i,X[0,Xr] ,F[0,?/r]) < {l + e)36. This 
is equivalent to showing that there exists a decomposition X[0, x] = Xq + Xi +X2 + ■ ■ ■ + X2J 
and a decomposition F [O, y] = Yq + Yi +Y2 + ■ ■ ■ + Y2i, such that for j = 0, . . . , i, it holds that 
dgr(X2j_i, Y2J-1) < (1 + £)36 and d3-(X2j, Y2j) < (1 + e)35. By the definition of the locally 
reachable free space, d5(X[0,Xr] ,F[0,yr]) < <5 < (1 + £)36 for any point r in Ji<siSo). As 
such, the claim is clearly true for i = 0. 

For i > 0, we inductively decompose the curves such that the pieces satisfy the above 
condition. For any point r in Jl<s{Si) there exists a point u = {x^ , yu) G Si, such that u is 
connected by a (x, ?/)-monotone path to r. This is ensured by Lemma 5.1 Let X2i = X^Xu,Xr\ 



and Y2i = Y^y^, y^] . We have that d3-(X2j, Y2i) < S < {1 + e)36, since this path is contained 
■m'D<siX2i,Y2^). 

The point u was added to Si because it was returned as the endpoint of a tunnel by the 



tunnel procedure in Line^ By Lemma 2.12 this tunnel starts at a point q G -Rj C "Ji^siSi^i) 
and has the property that dgr(X[xq,XuJ ,F[?/q,?/u]) < (1 + e)36. We set X2i_i = X[xq,Xu] 
and Y2i-i = Y[yq,yu\- By induction, we have that there exists an (i — l)-shortcut curve of 
Y\0, ?/q] that has Frechet distance at most {l + e)3S to X\0, Xq] . Concatenating these curves 
with the tunnel r(q, u), and the monotone path from u to r implies the claim. ■ 

Lemma 5.3. After each iteration of the outer for-loop as executed by the algorithm k- 
decider (depicted in FigureW^ it holds that the i-reachable free space Ci = 3i<^(X, y) is 
contained in [Jo<j<,i^<s{Sj). 

Proof: By the definition of the locally reachable free space, the claim is clearly true for i = 0. 
For z > 0, assume for the sake of contradiction, that the claim was false, and furthermore, 
that this is the minimal i for which the claim fails. As such, we can assume that the claim is 
true for any smaller value of i. That is, for any q, such that dg {i — 1, X[0, Xq] , Y\0, ?/q] ) < S, 
we have that q G 'Jl<siSj) for some j < {i — 1), in other words, the {i — l)-reachable free 
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space is computed correctly in previous iterations of the loop. Now, failing the claim means 
that the algorithm missed a point p G Cj \ !Jl<s{Si), such that there exists a i-shortcut curve 
of F[0, Xp] which is within Frechet distance 5 to X[0,?/p]. Let P denote the path in the 
parametric space realizing this distance. We will show that the existence of such a path P 
implies that p G Jl<s{Si) as computed by the algorithm, which then implies the claim. 

Let u and r be the points in the parametric space, such that ur is the last tunnel taken by 
P, and P connects r to p by a monotone path inside D<5(X, Y). We have that prc^(u, r) < 6. 
Furthermore, it must be that u G 3i<5(S'j_i), since u is part of the (i — l)-reachable free space 
which was computed correctly. So, let u' and u" be the gates of the reachability interval of 



3?<5(S'j_i), which contains u (as computed by Lemma 5.1). Similarly, let r' be the left gate 
of the edge containing r. Observe that u', u" G Ri and r' G Li. Consider the tunnel r(u', r). 
It must be that its price is less than or equal to 6, since prc^(u, r) < 6. The algorithm 



will invoke the tunnel procedure with the parameters Ri and r'. By Lemma 4A and since 
prc^(u', r) < S, it must return the endpoint v of an affordable tunnel that lies in the free 
space interval of r. Furthermore, the returned point is the leftmost such point, that is, no 
other tunnel from Ri of price less than or equal to 6 can end at a point in [r', v]. Therefore, 

V must lie to the left of r or be equal to r. The algorithm would then add v to the set Si. 
This implies that r, and therefore also p, are contained in Jl^siSi). ■ 

We now prove the correctness and running time of A;-Decider for approximating the 
fc-shortcut Frechet distance. 

Lemma 5.4. Given three parameters A; G IN, 5 > and < e < 1 and two c-packed 
polygonal curves X and Y of total complexity n. The algorithm A;-Decider depicted in 
Figure^outputs one of the following: (i) "ds(A;,X,F) < {l+e)?,5", or (ii) "ds(A;, X,F) > 5". 
In any case, the output returned is correct. The running time is 0(Ckn\og n) , where 
C = 0^6-^'^ \og{l/e). 

Proof: The algorithm fc-Decider computes the simplified curves X = simpl(X,/i) and 

Y = simpliY , fi) with /i = Q{e6), before invoking the algorithm A;-decider described in 
Figure[6]on these curves. If, by the end of the computations, (1, 1) G 3?<5(S'o)U- ■ ■U'Jl<s{Sk), 



as tested in Line 12, then Lemma 5.2 proves that the output returned in Line 13 is correct 



with respect to the simplified curves. Otherwise, Lemma 5.3 shows that the output in Line 15 



is correct with respect to them. By Lemma 2.6, this approximates the decision with respect 
to the original curves sufficiently. 

We assume that we have an annotated graph representation of the free space diagram as 



used in Lemma 5.1 As such, we can easily extract the gates contained in Ji^siSi) during its 
computation in Line [61 as well as, to check whether it contains (1, 1). 

The sets Ri, Si and Li are finite sets of two-dimensional points, and as such, we can use 
static two dimensional range trees for the orthogonal range queries in Line [T] and Line |5] in 
the tunnel procedure. 



As for the running time, observe that by Lemma 2.7 there are at most A^ = 0{cn/e) 
points in Li, Si and Ri, at any point in time, by Lemma [277 As such, building the range trees, 
at each iteration, takes 0(A^log A^) time, and this also accounts for all the queries performed 
on these data-structures. Note that computing the reachable free space in Line |6] at each 



iteration takes time in Od^illogn -|- A^logA^) = O(A^logA^) by Lemma |5.1[ Therefore, 
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maintaining the sets Ri, Si and Lj and all operations on them take Ti = 0{kN log N) time 
overall. Computing the initial set of gates Lq in Line pi takes T2 = 0(?7,log?7, + c^n/e) by 
Data-Structure 12.91 

Building the data-structure that supports the queries for the price of a tunnel takes 



T3 = 0{ne ^'^log^(l/£:) log^ ra)) time by Data-Structure 2.8 Throughout the algorithm 



execution, we perform 0{kN) such queries on this data-structure, which takes 

T4 = O {kNe~'^ log n log log n) = O {ckne~'^ log n log log n) 

time overall. 

As such, the overall running time is 

Ti + T2 + T3 + T4 = 0{ncke'^ logn + kN log N + ne''^'^ \og^ {I / e) log^ n + c^n/e) 
= 0{Ckn\og^n), 

where C = c^£:~^'^log(l/£:). ■ 

5.4 Result 

Theorem 5.5. Given an integer k > 0, a parameter e > 0, and two c-packed polygonal 
curves X and Y , with total complexity n, the algorithm described above computes a {3 + e)- 
approximation to the k-shortcut Frechet distance between X and Y in 0(c^/cnlog^n) time. 
The algorithm also outputs the shortcut curve ofY and the reparametrizations that realize 
the respective k-shortcut Frechet distance. 



Proof: The main algorithm is described in Section 3.3 and analyzed in Section 4.4 We use 



the decision procedure fc-Decider (Section 5.2) instead of Decider. The running time and 



correctness therefore follows from the proof of Theorem |4.11[ except that we use Lemma |5.4 
for the decision procedure. ■ 

6 Data-structure for approximating the Frechet dis- 
tance between a subcurve and a query segment 

Let Z be a polygonal curve in IR"^ with n vertices. Here we describe how to build a data- 
structure such that given a query segment h, and two points p, q G Z, one can quickly 
approximate the Frechet distance between h and the subcurve Z(p,q). 

6.1 Useful lemmas for curves and segments 

We have the following lemmas to testify that 
(i) the spine of a curve, is up to a factor of two, the closest segment to this curve with 



respect to the Frechet distance, see Lemma 6.1 



(ii) the Frechet distance between a curve and its spine is monotone, up to a factor of two. 



with respect to subcurves, see Lemma 6.2 and 
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(iii) shortcutting a curve can not increase the distance of the curve from a hne segment, see 
Lemma [6731 

Lemma 6.1. Let pq be a segment and Y be a curve. Then, (i) (^^(pq, Y) > cig-(pq, spine(F)), 
and (ii) dj(pq, Y) > d:F(spine(F) , Y) /2. 

Proof: Let r and u be the endpoints of Y] that is spine(y) = ru. 

(i) Since in any parameterization of pq with Y it must be that p is matched to r, 
and q is matched to u, it follows that d5(pq,F) > max(||p — -uH , ||q — v\\) = d3r(pq,ru) = 



dg-(pq, spine(F)), by Observation A.l 



(ii) By (i) and the triangle inequality, we have that dj(spine(F) , Y) < d3-(spine(F) , pq) + 
dj(pq,F) < 2dj(pq,F), which implies the claim. ■ 

Lemma 6.2. Given two curves Y and Y , such that Y is a subcurve of Y . Then, we have 
that dj^spine^F) , y) < 2d5(spine(F) , Y). 

Proof: Consider the matching under the optimal Frechet distance between Y and spine(y). 
It has to match the endpoints of Y to points q and r on spine(y). We have that dg^l F, qr j < 

d3r(F, spine(F)). By Lemma 6.1 (i), we have dgrfspinefFj , qrj <dgr(F,qrj < d5(F, spine(F)). 
Now, by the triangle inequality, we have that 

d:r(?, spine (y)) < d^^F, qr) + d:r(qr, spine (?)) < 2dj(F, spine(r)) . 

Lemma 6.3. Let Y = U1U2 ■ ■ -u^ be a polygonal curve, pq be a segment, and let i < j be 
any two indices. Then, forY' = Y {ui, Ui) + UiUj + Y {uj , Un) , we have djiY', pq) < dgr(F, pq). 

Proof: Consider the parameterization realizing d3-{Y, pq), and break it into three portions: 
(i) the portion matching Y{ui,Ui) with a "prefix" pp' C pq, 
(ii) the portion matching Y{ui,Uj) with a subsegment p'q' C pq, and 
(iii) the portion matching Y{uj,Un) with a "suffix" q'q C pq. 



Now, by Lemma 6.1 (m), we have that 

d:f(>", pq) =max(^dj(F(Mi,Ui),pp'), dy{Y{u^,Uj) ,p'q') , dj(y(Mj-,M„) , q'q) 
>max(^d5(F(ui,Mi),pp'), dg-( UiUj , p'q') , d3.(F(uj, u„) , q'q) 
>cij(y',pq). _ 

6.2 A constant factor approximation for segment queries 

6.2.1 The data-structure 

Preprocessing. Build a balanced binary tree T on the edges of Z. Every internal node 
u oi T corresponds to a subcurve of Z, denoted by cr(z/). For a node z/ G V{T), let seg(z/) 
denote the spine of cr(z/) (i.e., the segment connecting the two endpoints of the curve cr(z/)). 
For every node of i/ G V{T), we precompute its Frechet distance of the curve cr(z/) to the 
segment seg(z/), and let di^ denote this distance. 
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Answering a query. Given any two vertices u and v oi Z and a corresponding pair of 
points p and q, our target is to approximate (^^(pq, Z{u, v)). To this end, one can compute, 
in O(logn) time, k = O(logn) nodes i/i, . . . , z/^ of T, such that Z{u, v) = cr(z/i) + cr(i/2) + 
■ ■ ■ + cr(z/fc). We compute the polygonal curve Y = seg(z/i) + ■ ■ ■ + seg(z/fc), and compute its 
Frechet distance from the segment pq; that is, d = dgr(pq,y). We return 

k 

A = d + max d^ 
as the approximate distance between uv and the subcurve Z{u,v). 

6.2.2 Analysis 

Lemma 6.4. Given a polygonal curve Z with n edges, one can preprocess it in 0{n\og n) 
time, such that for any pair u,v G V{Z) and a segment pq, one can compute, mO(lognloglogr2) 
time, a ?,- approximation to d3r(pq, Z(u, f)). 

Proof: The construction of the data-structure and how to answer a query is described above. 
For the preprocessing time, observe that computing the Frechet distance of a segment to 
a polygonal curve with k segments takes 0{k\ogk) time |AG95j . As such, the distance 
computations in each level of the tree T take 0(n log n) time, and 0(nlog n) time overall. 

As for the query time, computing Y takes O(logn) time, and computing its Frechet 
distance from pq takes O (log n log log n) time |AG95j . 

Finally, observe that the returned distance A is a realizable Frechet distance, as we can 
take the parameterization between pq and F , and chain it with the parameterization of every 
edge of Y with its corresponding subcurve of Z. Clearly, the resulting parameterization has 
width at most A. 

Let t be the index realizing maxjL]^ d^^.. Then, by repeated application of Lemma 6.3, we 
have that d = d3r(pq,y) < dgr(pq, Z). As such, we have that 

k 

A = d + maxd^. = d^{pq,Y) + d^, < dj(pq,Z) + dj{seg{vt) ,cr{vt)) 
<d5(pq,Z) + 2 min d5(p'q', cr(fi)) < 3dj(pq, Z) . 

p'q'Cpq 

To see the last step, consider the parameterization realizing dg-(pq, Z), and consider the sub- 
segment p'q' of it that is being matched to cr(ft) C Z. Clearly, dgr(p'q', cr(t>t)) < dj(pq, Z). ■ 

Theorem 6.5. Given a polygonal curve Z with n edges, one can preprocess it in O(nlog^n) 
time and using 0{n) space. Now, given a query specified by 
(i) a pair of points u and v on the curve Z, 

(a) the edges containing these two points, and 

(Hi) a pair of points p and q, 
one can compute, in O(lognloglogn) time, a ?>- approximation to Ay{pq,Z{u,v)). 

Proof: This follows by a relatively minor modification of the above algorithm and analysis. 
Indeed, given u and v (and the edges containing them), the data-structure computes the two 
vertices u' ,v' that are endpoints of these edges that lie between u and v on the curve. The 
data-structure then concatenate the segments uu' and v'v to the approximation Y (here Y 
is computed for the vertices u' and v'). The remaining details are as described above. ■ 
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6.3 A (l+£:)-approxiniation — a segment query and the entire curve 

6.3.1 Data-structure 

We are given a polygonal curve Z in IR with n segments, and we would like to preprocess 
it for (1 + £:)-approxiniate Frechet distance queries against a query segment. To this end, let 
L = dj{uv, Z), where uv is the spine of Z. We sprinkle an exponential grid G{u) of points 
around m, such that for any point p G IR'^, if ||p — u\\ < lOL/e then there exists a point 
p' G G{u) such that ||p — p'|| < (e/4) ||p — u\\ + eL/10. It is easy to verify that this can be 
guaranteed while the resulting set G{u) has 0(e~'^log 1/e) points. We sprinkle a similar grid 
G{v) around the vertex v. 

Now, for every pair of points (p', q') G G{u) x G{v) we compute the Frechet distance 
Z}[p',q'] = dj(p'q', Z) and store it. Thus, the preprocessing takes 0(^ne~'^'^\og^ l/elogn) 
time, and requires 0[e~'^'^\og^ 1/e) space. 

Answering a query. Given a query segment pq, we compute the distance 

r = max(||p — m|| , ||p — t>||) . 

If r > lOL/e then we return r + L as the approximation to the distance cigr(pq, Z). 

Otherwise, let p' (resp. q') be the nearest neighbor to p in G{u) (resp. G{v)). We return 
the distance 

A = D[p',q']+max(||p-p'||,||q-q'||) 

as the approximation. 

6.3.2 Analysis 

Lemma 6.6. Given a polygonal curve Z with n vertices in IR , one can build a data- 
structure, in 0{ne~'^'^\o^{l/ e) log n)) time, that uses 0{e~'^'''\o^l/e) space, such that given 
a query segment pq one can (1 + e)- approximate cigr(pq, Z) in 0(1) time. 

Proof: The data-structure is described above. 

As for quality of approximation, using the notations above, we have that ci3-(pq, Z) > r. 
As such, if r > lOL/e then r < dgr(pq, Z) < r + L < (1 + e)r < (1 + £)dj(pq, Z). 



Otherwise, by Lemma 6.1, we have that d5(pq,Z) > L/2. As such, we have that 

d:,(pq, pV) = max(||p - p'|| , ||q - q'||) < {er + eL)/% < (£/4)d^(pq, Z) . 

We conclude that for the returned distance A it holds 

A = d^(pV, Z) + d5(pq, pV) > dj(pq, Z) > dj(pV, Z) - d^lpq, p'q') 
= A - 2d:,(pq, p'q') > A - (£/2)dj(pq, Z) . 

Namely, A < (1 + £/2)dg^(pq, Z) as desired. 

As for the query time, given pq we compute the distance of the endpoints of this segment 
from the endpoints of Z. If they are too far away, we are done as then the Frechet distance is 
determined by this distances and can be computed in this time. Otherwise, we need to find 
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the two points in the exponential grid closest to p and q. For the point p its distance from 
u tells us which resolution of the grid we should use. Next, by direct lookup into this grid 
(by using hashing), one can find out the cell containing p in constant time. Now, finding the 
closest vertex of the cell to the query point takes constant time. We apply the same process 
to q. Now that we have the two corresponding grid points, computing the approximate 
distance takes constant time. ■ 

6.4 A (1 + e)-approxiniation — a segment query and a subcurve 

6.4.1 Data-structure 

Preprocessing. Let Z be a given polygonal curve with n vertices. We build the data- 



structure of Theorem 6.5 Next, for each node of the resulting tree T, we build for its 



subcurve the data-structure of Lemma 16.61 

Answering a query. A query is specified by points u, v, p and q. Here u and v are points 
on Z (and we also given the edges of Z containing these two points), and p, q define the 
query segment. Using the data-structure of Theorem |6.5| returns a 3- approximation r to 
d5(pq,Z(M,t;)); that is, d3r(pq, Z(u, t;)) < r < 3dj(pq, Z(ii,'y)). 

This query also results in a decomposition of Z{u,v) into m = 0{logn) subcurves, and 
let u = fo,fi, • . . ,Vm-i,Vm = V he the vertices of these subcurves, where vqVi and Vm-iVm 
are subsegments of Z. 

We partition the segment pq into M = [||pq|| /{er/20)] equal length segments. Let V be 
the set of vertices of this uniform partition. 

For each vertex Vi, for i = 1, . . . ,m — 1, we compute its nearest point on pq, and let 
Vi ^ V he the set of all vertices in V that are in distance at most 2r from Vi. The set Vi 
is the set of candidate points to match Vi in the parametrization that realizes the Frechet 
distance. Now, for z = 1, ... ,m — 1, and pair of points x E Vi and y G V^+i, compute a 
(1 + £:/4)-approximation to the Frechet distance between Z{vi, f j+i) and xy. This portion of 
the curve corresponds to a node in T, and this node has an associated data-structure that 



can answer such queries in constant time (see Lemma 6.6). 

For any point x G V^i, we directly compute the Frechet distance vqVi with px. Similarly, 
we compute, for each y G Kn-i, the Frechet distance of the segment Vm-iVm to the segment 

Now, using dynamic programming, we find the cheapest parameterization of Z{u, v) (bro- 
ken into subcurves by the vertices vq, . . . , Vm) with VqxViX--- Vm-\ x Kn, where Vq = {p}, 
Vm = {^}, and every subcurve Z{vi,Vi+i) is matched with two points in the corresponding 
sets Vi and V^+i. 

Formally, we build a graph G where IJ^ Vi is the multiset of vertices. Two points x E Vi 
and y G V^+i are connected by a direct edge in this graph if and only if y is after x in 
the oriented segment pq. The price of such an edge x — )■ y is the approximated cost of the 
Frechet distance between Z{vi,Vi^i) and xy. We are looking for the path in this graph with 
minimum maximum cost on a single edge, connecting p with q. The cost of this path is 
returned as the approximation to the Frechet distance between Z{u,v) and pq. 
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6.4.2 Analysis 

Query time. Computing the set of vertices vo,vi, . . . ,Vm takes 0{m) = 0(\ogn) time. 
The graph G has 0{m/e) vertices and they can be computed in 0{m/e) time. Similarly, 
this graph has 0{m/e^) edges, and the cost of each edge can be computed in constant 



time, see Lemma 6.6 Computing the cheapest path between p and q in G can be done in 
0{{m/e) log{m/e) + m/e^) time, using a variant of Dijkstra's algorithm. Overall, the query 
time is 0{m + {m/e)\og{m/e) +m/e^) = 0(e~^lognloglogn). 



6.5 



6.6 



Preprocessing time and space. Building the data-structure described in Theorem 
takes 0{n log^ n) time. For each node v of this tree, building the data-structure of Lemma 
takes 

per node, where n„ is the number of vertices of the curve stored in the subtree of v. As 
such, overall, the preprocessing time is 0(n£~^'^log^(l/e) log^n). For each node, this data- 
structure requires ©(e'^'^log^ 1/e) space. 

The result. Putting the above together, we get the following result. 

Theorem 6.7. Given a polygonal curve Z with n vertices in IR , one can build a data- 
structure, m 0(ne~^'^log^(l/£:) log^n) time, using O (^ne"'^^ log^ I / e) space, such that for a 
query segment pq, and any two points u and v on the curve (and the segments of the curve that 
contains them), one can {1+e)- approximate the distance dj-{Z{u,v) , pq) in 0{e~^ log n log log n) 
time. 



We emphasize that the result of Theorem 6.7 assumed nothing on the input curve Z. In 



particular, the curve Z is not necessarily c-packed. 

7 Universal Frechet simplification and applications 

Here, we study the problem of computing a permutation of the vertices of the curve, such that 
for any k, the curve formed by the first k vertices in this permutation is a good approximation 
to the optimal simplification of a curve using (roughly) k vertices. 

7.1 Universal simplification 

We can use the data-structure described above to preprocess Z, such that, given a number 
of vertices fc G IN, we can quickly return a simplification of Z which has 2k — 1 vertices of 
the original curve and minimal Frechet distance to Z up to a constant factor, compared to 
any simplification of Z with only k vertices. 

Definition 7.1. Given a polygonal curve Z and a subset V of the vertices of Z, which 
contains the endpoints of Z. Consider the curve Z', which has V as its set of vertices and 
such that V E V appears before u E V on 1! if and only if this is true for Z. We call Z' a 
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spine curve of Z and we denote it with Z' = Zy/. Additionally we may call Z' a k-spine 
curve of Z if it has k vertices. 

Definition 7.2. Given a polygonal curve Z and a permutation $ = (fi,...,f„) of the 
vertices of Z, where vi and f2 are the endpoints of Z, let V^ be the subset {vj \ I < j < i} 
of the vertices for any 2 < i < n. We call $ a universal permutation if it holds that 
(i) cidg-(Zi/^,Z) > d3r(Zy^_^^,Z), for any 2 < i < n, and 
(ii) dj(Zv.,Z) < C2dj(F, Z), for any polygonal curve Y with [i/ca] vertices, 
where ci, C2 and C3 are constants larger than one which do not depend on n. 

7.1.1 Construction of the permutation 

We compute a universal permutation of Z. The idea of the algorithm is to estimate for each 
vertex the magnitude of error introduced by removing it, and repeatedly remove the vertex 
with the lowest error in a greedy fashion. 

Specifically, for each vertex v, which is not an endpoint of Z, let v~ be its predeces- 
sor on Z and let f"*" be its postdecessor on Z. Let 0„ be a (ll/10)-approximation of 
dg-(Z(t>^,t>+) ,t>~t>+). Insert the vertex v with weight 0^ to a min heap Ti. Repeat this 
for all the internal vertices of Z. 

At each step, the algorithm extracts the vertex v from the heap H having minimum 
weight. Let u = t>~(Z-^) and w = v^iZy) be the predecessor and postdecessor of v in the 
curve Z^, respectively, where "H denotes the set of vertices currently in the heap with the 
addition of the two endpoints of Z. 

The algorithm removes v from H and updates the weight of u and w inl-i (if the vertex 
being updated is an endpoint of Z its weight is +00 and its weight is not being updated). 
Updating the weight of a vertex u is done by computing its predecessor and postdecessor 
vertices in the current curve Z^ (i.e., u~ = m~(Z^) and u^ = u~^{Zy^)) and approximating the 
Frechet distance of the subcurve of (the original curve) Z between these two vertices and the 
segment u~u~^. Formally, the updated weight of u is (pu, which is a (ll/10)-approximation 
to 

The updated weight of w is computed in a similar fashion. 

The algorithm stops when Ti is empty. Reversing the order of the handled vertices, results 
in a permutation {vi, . . . , Vn), where Vi and V2 are the two endpoints of Z. 



Implementation details. Using Theorem 



6.7, the initialization takes 0(nlog^ nj time 
im keeps the current set of vertices of "H 



overall, using e = 1/10. In addition, the algorit 

in a doubly linked list in the order in which the vertices appear along the original curve Z. 

In each iteration, the algorithm performs one extract-min from the min-heap "H, and calls 



the data-structure of Theorem |6.7| twice to update the weight of the two neighbors of the 
extracted vertex. As such, overall, the running time of this algorithm is O (n log^ n) . 
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Extracting a spine curve quickly. Given a parameter K^ we would like to be able to 
quickly compute the spine curve Zy^, where Vk = {fi, . . . ,vk}- To this end, we compute 
for i = 1, . . . [log2nJ, the spine curve Zy. by removing the unused vertices from Zy.^^. 
Naturally, we also store the original curve Z. Clearly, one can store these O(logra) curves in 
0{n) space, and compute them in linear time. Now, given K, one can find the first curve 
in this collection that has more vertices than K, copy it, and remove from it all the unused 
vertices. Clearly, this query can be answered in 0{K) time. 

7.1.2 Analysis 

Lemma 7.3. Let {vi, . . . ,f„) be the permutation computed above. Consider a value k, and 
let Vk = {ui, . . . ,Uk} be an ordering of the vertices ofvi,...,Vk by their place along Z. Then, 
it holds that d5(Z,ZyJ < maxi<i<fc_i d3r(Z(Mj,Mi+i) ,UiUi+i). 

Proof: This is immediate, as one can concatenate, for i = 1, . . . ,k — 1, the reparameteriza- 
tions realizing d3-(Z(Mj,Mj+i) ,ViUi+i), to obtain reparameterizations of Zy^. and Z, and such 
that the Frechet distance is the maximum used in any of these reparameterizations. ■ 

Let Vi, . . . ,Vn be the permutation of the vertices of Z as computed in the preprocessing 
stage, and let 0(wi) denote weight of vertex Vi at the time of its extraction. We have the 
following three lemmas to prove that the computed permutation is universal. 

Lemma 7.4. For any 1 <i < n, it holds that maxj<j<„ 0(f j) < 4(j){vi). 

Proof: We show that the weight of a vertex at the time of extraction is at most a constant 
factor smaller than the final weight of any of the vertices extracted before this vertex. Let Vi 
be such a vertex and let (pjivi) be the weight of this vertex at the time of extraction of some 
other vertex Vj, with j > i. Clearly, 4>{vj) = 4>j{vj) < 4>j{vi), since the algorithm extracted 
Vj with the minimum weight at the time. If 0(wj) = (piivi) > 4>j{vi) then the claim holds. 

Otherwise, if 0(f j) = (piivi) < (j)j{vi), then there must be a vertex which caused the weight 
of Vi to be updated. Let k be the minimum index, such that j > k > i and 0j(fj) = (t>k{vi)- 
We have that 0(fj) is a Y^-approximation of the Frechet distance dj(-u*w*, Z{u\w'^)) for two 
vertices m* and w^. Similarly, we have that (pkivi) is a ^^-approximation of the Frechet 
distance dj(M'^w'^, Z^m'^, u;^)) for two vertices u^ and w^. Observe that since the extraction 
of Vk caused the weight of Vi to be updated, it must be that T.ivJ'^w''^ is a subcurve of 



Z{u'^,w^). As such, we have by Lemma 2.11, that 



^ ■ Mv^) < M^'^'^ Z(n^ w'')) < 3dj{uW, Z{u\ w')) < 3 ■ ^ ■ <j)iv,) . 
Now it follows that 0(fj) < 4>j{vi) = (pki^i) < 40(fj), which proves the claim. ■ 

Lemma 7.5. For any 3 < i < n it holds that dj(Zy^, Z) < 50(f.j+i). 

Proof: Let Mi, . . . , Mj be the vertices in Vi in the order in which they appear on Zy. Consider 
the mapping between Z and this spine curve, which associates every edge UjUj+i of Zy. with 
the subcurve Z(mj,Mj+i). Clearly, it holds that 
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dy(Z,Zy) < maxd3r(Z(Mj,Mj+i) ,UjUj+i) < — max (j){vj] 
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Indeed, if Uj+i is the postdecessor of Uj on Z, then d3-(Z(-Uj,-Uj+i) ,UjUj+i) = 0, otherwise, 
there must be a vertex which appears on Z in between Uj and Uj+i, which is contained in 
Vn\Vi and the weight of this vertex is the approximation of this distance at the time of 



extraction. Now it follows by Lemma 7.4 that d3-(Z,Zv'J < 50(fj+i 



Lemma 7.6. For any 2 < k < n/2 — 1, let Y^ be the curve with the smallest Frechet distance 
from Z with k vertices (note, that Y^ is not restricted to have its vertices lying on Z). We 
have that (i:f{Z,Y*) > (5/11)0(vk+i), where K = 2k-l. 

Proof: Let / : Y^* — ;■ Z be the mapping realizing the Frechet distance between y^* and Z. 

Let Vi = (f 1, . . . , f j), for 2 = 1, . . . , n. 

Since Y^ has only k vertices, it breaks Z into k — 1 sub- ^ ^ •^("'j^ 

curves. Since, K > 2{k — 1) + 1, there must be three consec- 
utive vertices Ui,Ui+i,Ui+2 on Zy^^ and two vertices Wj,Wj+i 
of Y^, such that the vertices Ui, Uj+i, Mj+2 appear on the sub- 
curve Z' = Z{f{wj), f{wj+i)), see figure on the right. 



Now, / ^{ui)f "'^(^ij+2) ^ ^j'^j+i s-iid by Lemma 6.1 (B) 



(see also Definition 2.1), we have 




(^y{Z,Y^)>(^f{z{f{wj)J{wj+i)),WjWj+i^ >df{Z{ui,Ui+2)J \u,)f \ui+2)) 

> dj(Z(Mi,Mi+2) ,f''^{Ui)f'^{Ui+2)) > -d3-fZ(Mj,Mi+2),Spine(Z(Mi,Mi+2)) 



1 10 ^ . ^ 5 ^ 

> 2 ■ Yl^K+l{Ui+i) > —(I)k+i[Vk+i) 
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4>{vk+i 



as the simplification algorithm removed the minimum weight vertex at time K + 1 (i.e., 

vk+i). ■ 

7.1.3 Result 

Theorem 7.7. Given a polygonal curve Z with n edges, we can preprocess it using 0{n) 
space, in 0(?2log n) time, such that, given a parameter k Gf^, we can output in 0{k) time 
a {2k — l)-spine curve Z' ofZ and a value 5, such that 

(i) Vll<dj(F;,Z), and 

(it) dy(Z',Z)<5, 

where Y^ is the polygonal curve with k vertices with minimal Frechet distance from Z. (For 
k > n/2 we output Z and 6 = 0). 

Proof: The algorithm computing the universal permutation and its associated data-structure 
is described above, for K = 2 A; — 1. Specifically, it returns the spine curve Z' = Zy^^ as the 
required approximation, with the value 6 = 50(fx+i)- Computing Z' takes 0{k) time. By 



Lemma 7.5 and Lemma 7.6, we have that Z' and 5 satisfy the claim. 

Building the data-structure takes O(nlog^n) time, and it uses 0{n) space, and each 
query to this data-structure takes O (log n log log n) (using e = 1/10). We perform a constant 
number of these queries to the data-structure per extraction from the heap. ■ 
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7.2 Extensions — Queries with curves with k vertices 



Using the result in Section |7.1[ we can preprocess a curve, such that one can approximate 
the Frechet distance of a subcurve to a given query curve. 

7.2.1 The data-structure 

The input is a polygonal curve Z with n vertices. 



Preprocessing. Similar to the algorithm of Section |6.2[ build a balanced binary tree T 
on Z. For every internal node v oiT construct the data-structure of Theorem 7.7 for cr(z/), 
denoted by V^, and store it at v. 

Answering a query. Given any two vertices u and v of Z, and a query polygonal curve 
Q with k segments, the task is to approximate d3r(Q, Z(m, t>)). We initially proceed as in 
Section 6.2[ computing in O(logn) time, m = O(logn) nodes z/i,...,z/m of T, such that 



Z(m, v) = cr(z/i) + cr(i/2) + ■ ■ ■ + cr(z/fc). Now, extract a simplified curve with K vertices from 
V^., denoted by simplK(^'i), for i = 1, . . . , m, where K = 2k — 1. For i = 1, ... ,m, let 6i 
denote the simplification error (as returned by V^-), where dj(simplK(^'i) ,cr(i/j)) < 6i and 
Si/11 is a lower bound to the Frechet distance of any curve with at most k vertices from 



cr(z/j), for i = 1, . . . , m (see Theorem 7.7) 



Next, compute the polygonal curve S = simplK(z^i) + ■ ■ ■ + simplK(^'m), and its Frechet 
distance from Q; that is, d = d5(S, Q). We return 

A = (i + max(5j, (5) 

as the approximate distance between Q and Z{u,v). 

7.2.2 Analysis 

Query time. Extracting the m = O(logn) relevant nodes takes O(logn) time. Query- 
ing these m data-structures for the simplification of the respective subcurves, takes 0{km) 



overall, by Theorem |7.7[ Computing the Frechet distance between the resulting simplifica- 
tion S of Z{u,v), that has 0{mk) edges, and Q takes 0(/c^m log (fc^m)) time |AG95j . Thus 
the overall time used for answering a query is bounded by 0{m + km + k'^m\og{k'^'m)) = 
0{k'^mlog{km)) = 0{k'^lognlog{klogn)). 

Preprocessing time and space. Building the initial tree T takes 0{n) time and it re- 
quires 0{n) space. Let l(z/) denote the number of vertices of cr(i/). For each node u, com- 
puting the additional information and storing it requires 0(l(z/)) space and 0(\{i') log^l(j/)) 
time. Recall that T is a balanced binary tree and for the nodes z/i,...,!/^ contained in 
one level of the tree it holds that XIkj K^i) sums up to n. Thus, computing and storing 
the additional information takes an additional 0(n log n) time and O(nlogn) space by 
Theorem 17.71 
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Quality of approximation. 

Lemma 7.8. Given a polygonal curve Z and a query curve Q with k segments, the value 
A (see Eq. [^) returned by the above data- structure is a constant factor approximation to 
d^{Q,Z{u,v)). 

Proof: Clearly, A bounds the required distance from above, as one can extract a reparam- 
eterization of Q and Z{u,v) realizing A. As such, we need to prove that A = 0{r), where 
r = d5(Q,Z(M,t;)). 

So, let / : Q — )■ Z{u,v) be the mapping realizing r = dj(Q, Z('U,t>)), and let Qj = 
/~^(cr(z/j)), for i = l,...,m. Clearly, r = maXjdj(Qj, cr(z/j)). Since Qj has at most k 



vertices, by Theorem |7.7[ we have 

"Y < d:^(Qi,cr(z/j)) < r, and dy(simplK(z/i) , cr(z/i)) < Si, (6) 

for i = 1, . . . ,m. In particular, we have Si < llr. Now, by the triangle inequality, we have 
that 

dgr(simplK(z/j) , Qi) < d3r(simplK(z/j) , cr(z/j)) + d:^(cr(z/i) ,Qi) <Si + r < 12r. 

As such, d = dj(S,Q) < maxjdg:-(simplK(^'i) , Qi) < 12r. Now, A = d + meiXiSi < 12r + llr = 
23r. ■ 

The result. Putting the above together, we get the following result. We emphasize that k 
is being specified together with the query curve, and the data-structure works for any value 
of fc. 

Theorem 7.9. Given a polygonal curve Z with n edges, we can preprocess it, in time 
0{n\og'^ n), and space 0(n log n). Now, given a query specified by 
(i) a pair of points u and v on the curve Z, 

(ii) the edges containing these two points, and 

(Hi) a query curve Q with k segments, 
one can approximate dj(Q,Z(M,f)) up to a constant factor, in 0{k'^ \ogn\og{k log n)) time. 

Proof: The preprocessing is described and analyzed above. The query procedure needs to 
be modified slightly since the u and v are not necessarily vertices of Z. However, this can be 



done the same way as for the initial data-structure in Theorem 6.5[ Let m', v' be the first and 
the last vertices of Z contained in Z(m, v). We now extract the m = O(logn) nodes z/i, . . . , z/^ 
of T, such that X = uu' + cr(z/i) + . . . + cr(z/m) + v'v is equal to Z(m, v). We continue with 



the procedure as described above using this node set. The analysis of Lemma |7.8| applies 
with minor modifications. ■ 



8 Conclusions 

In this paper, we presented algorithms for approximating the Frechet distance when one is 
allowed to perform k or more shortcuts on the original curves. Surprisingly, for c-packed 
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curves it is possible to compute a constant factor approximation in a running time which is 
near hnear in the complexity of the input curves. 

We also presented a way to compute an ordering of the vertices of the curve, such that 
any prefix of this (universal) ordering serves as a good approximation to the curve in the 
Frechet distance, and it is optimal (up to constant factors). 

Finally, we used this universal permutation to develop a data-structure that can quickly 
approximate (up to a constant factor) the (regular) Frechet distance between a query curve 
and the input curve. Surprisingly, the query time is logarithmic in the complexity of the 
original curve (and near quadratic in the complexity of the query curve). 

There are many open questions for further research. The most immediate questions being 
how to extend our result to the other definitions of a shortcut Frechet distance mentioned 
in the introduction and how to improve the approximation factor. The work in this paper 
is a step towards solving these more difficult questions. Furthermore, we think that one can 
combine our results with a previous extension to low-density graphs [CDG^llj . and obtain 
a near-linear time map-matching algorithm, which automatically introduces shortcuts where 
the road-network is incomplete. 
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A Background and standard definitions of the Frechet 
distance and the free space 

Some of the material covered here is standard, and follows the presentation in Driemel 
et al. |DHW10] . A reparameterization is a one-to-one and continuous function / : 
[0,1] — > [0,1]. It is orientation-preserving if it maps /(O) = and /(I) = 1. Given 
two reparameterizations / and g for two curves X and Y, respectively, define their width 
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as width.fg{X, Y) = maxQ,g[o,i] ||X(/(a)) — Y{g{a))\\ . Given two curves X and Y in H*^, the 
Frechet distance between them is 

(i^(X,Y)= min width. „(X,F), 

/:[0,1H[0,1],9:[0,1]^[0,1] 

where / and g are orientation-preserving reparanieterizations of the curves X and Y, re- 
spectively. The Frechet distance comphes with the triangle inequality; that is, for any three 
curves X, Y and tt we have that dg^(y, tt) < dj(y, X) -|- d3r(X,7r). 

The Frechet distance is defined only for oriented curves, as we need to match the start 
and end points of the curves. The orientation of the curves we use would be understood from 
the context. Alt and Godau |AG95j showed how to compute the Frechet distance between 
two polygonal curves, with n and m edges in Olnmlogmn) time. 

Let X and Y be two polygonal curves and 6 > a. parameter, the 6-free space of X 
and Y is defined as 



1)^s{X,Y) = [{x,y)e[0,lf 



\X(x)-Y(y)\\ <5 






C'ij 



^ 



If, 



We are interested only in polygonal curves, which we assume to have natural uniform pa- 
rameterizations. The square [0, 1]^, which represents the parametric space, can be broken 
into a (not necessarily uniform) grid called the free space diagram, where a vertical line 
corresponds to a vertex of X and a horizontal line corresponds to a vertex of Y. 

Every two segments of X and Y define a free space cell Ij'j 

in this grid. In particular, let Cij = Cij{X,Y) denote the free y^ ^ 

space cell that corresponds to the zth edge of X and the jth / ™, 

edge of Y . The cell Cj ,, is located in the zth column and jth 
row of this grid. /,'Lij 

It is known that the free space, for a fixed b, inside such 
a cell Cjj (i.e., D<5(X, y) fl Cjj) is the clipping of an affine 
transformation of a disk to the cell |AG95] . see the figure on 'j-i 

the right; as such, it is convex and of constant complexity. Let I^, denote the horizontal 
free space interval at the top boundary of Cij, and I^: denote the vertical free space 
interval at the right boundary. 

We define the complexity of the relevant free space, for distance 5, denoted by 'N<s{X, Y), 
as the total number of grid cells that have a non-empty intersection with T><s{X, Y)- 

Observation A.l. Given two segments pq and uv, it holds dg-(pq,uf) = max(||M — p||, 
||i; — q||). To see this, consider the uniform parameterization p(t) = tp + (1 — t)q and 
u{t) = tu -\- {1 — t)v, for t e [0, 1]. It is easy to verify that f(t) = ||p(t) — u(t)\\ is convex, 
and as such f(t) < max(/(0), /(I)), for any t G [0, 1]. 



Free space events To compute the Frechet distance consider increasing 5 from to oo. 
As 6 increases structural changes to the free space happen. We are interested in the radii 
(i.e., the value of 6) of these events. 
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Figure 8: Two curves X and Y and their free space diagram D<5(X, F), where p = 
X(s),q = X{s') and r = Y(t). Here, 6 is the minimal free space parameter, such that a 
monotone path exists, i.e., in this example dgr(X, F) coincides with a monotonicity event. 




Consider a segment u E X and a vertex p e y, a vertex- 
edge event corresponds to the minimum value 6 such that u is 
tangent to b(p,5). In the free space diagram, this corresponds to 
the event that a free space interval consists of one point only. The 
line supporting this boundary edge corresponds to the vertex, and 
the other dimension corresponds to the edge. Naturally, the event could happen at a vertex 
of u. 

The second type of event, a monotonicity event, corresponds to a value 6 for which a 
monotone subpath inside 2) becomes feasible, see Figure [8j Geometrically, this corresponds 
to two vertices p and q on one curve and a directed segment u on the other curve such that: 
(1) u passes through the intersection s of S(p,5) fl S(r, 5), and (2) u intersects b(r, 5) first 
and b(p, 6) second, where p comes before q in the order along the curve X. 
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